SARSA
SARSA (State-Action-Reward-State-Action) is an on-policy reinforcement learning algorithm that learns by updating its value estimates based on the actions it actually takes. The name comes from the sequence of information it uses: it observes the current state (S), takes an action (A), receives a reward (R), moves to a new state (S), and then selects the next action (A) before updating its knowledge. Unlike Q-learning which always assumes optimal future actions, SARSA updates its estimates based on the action it will actually take next, including any exploratory random actions.
Friday, March 18, 1927