VIRTUALINA On-Policy

SARSA

SARSA (State-Action-Reward-State-Action) is an on-policy reinforcement learning algorithm that learns by updating its value estimates based on the actions it actually takes. The name comes from the sequence of information it uses: it observes the current state (S), takes an action (A), receives a reward (R), moves to a new state (S), and then selects the next action (A) before updating its knowledge. Unlike Q-learning which always assumes optimal future actions, SARSA updates its estimates based on the action it will actually take next, including any exploratory random actions.

Friday, March 18, 1927

Successor Representation

Successor Representation (SR) is a reinforcement learning framework that decomposes value functions into two separate components, a representation of future state occupancy and immediate rewards. Instead of directly learning the value of being in a state, SR learns the expected discounted future visitation frequencies—essentially asking "if I start in state s and follow my policy, how much time will I spend in each other state?" This representation, combined with separate reward predictions, creates a middle ground between model-free methods (like Q-Learning) and model-based methods, enabling faster adaptation when rewards change but the environment dynamics remain constant.

Friday, March 18, 1927

Tag: On-Policy