Q-Learning
Q-Learning is an off-policy reinforcement learning algorithm that learns the optimal action-value function independently of the agent's actual behavior. The "Q" stands for quality, representing how good each action is in each state. Unlike SARSA, Q-Learning is off-policy because it learns about the greedy (optimal) policy while potentially following a different exploratory policy. This separation allows it to learn the best possible strategy even while taking random exploratory actions during training.
Thu Sep 25 2025
SARSA
SARSA (State-Action-Reward-State-Action) is an on-policy reinforcement learning algorithm that learns by updating its value estimates based on the actions it actually takes. The name comes from the sequence of information it uses: it observes the current state (S), takes an action (A), receives a reward (R), moves to a new state (S), and then selects the next action (A) before updating its knowledge. Unlike Q-learning which always assumes optimal future actions, SARSA updates its estimates based on the action it will actually take next, including any exploratory random actions.
Sun Sep 21 2025