Successor Representation

Successor Representation (SR) is a reinforcement learning framework that decomposes value functions into two separate components: a representation of future state occupancy and immediate rewards. Instead of directly learning the value of being in a state, SR learns the expected discounted future visitation frequencies—essentially asking "if I start in state s and follow my policy, how much time will I spend in each other state?" This representation, combined with separate reward predictions, creates a middle ground between model-free methods (like Q-Learning) and model-based methods, enabling faster adaptation when rewards change but the environment dynamics remain constant.

How Successor Representation Works

The SR maintains a matrix M(s,s') that represents the expected discounted number of times the agent will visit state s' after starting in state s while following its policy. The value of a state is then computed by combining this successor representation with a separate reward vector:

V(s) = Σ M(s,s') · r(s'),

where r(s') is the expected immediate reward in each state. When learning, the SR updates using a TD-like rule:

M(s,s') ← M(s,s') + α[I(s') + γM(s₊₁,s') - M(s,s')],

where I(s') is an indicator that equals 1 if s'=current state and 0 otherwise. This separates learning the structure of the environment (encoded in M) from learning rewards (encoded in r), allowing the agent to reuse knowledge about "where will I go" even when "what will I get there" changes.

When to Use Successor Representation

Successor Representation excels in environments where the transition dynamics stay constant but reward functions change frequently—such as navigation tasks where the layout remains fixed but goals shift, or resource gathering where item locations are stable but their values fluctuate. It enables rapid transfer learning and policy adaptation without retraining from scratch, since only the reward predictions need updating while the successor representation remains valid. SR also provides connections to neuroscience, as it mirrors how the hippocampus may encode spatial knowledge in the brain. The tradeoff is increased memory requirements (storing a full matrix rather than single values) and computational complexity, making it most practical for moderately-sized state spaces or when using function approximation to compress the representation, but offering significant advantages when rewards are non-stationary or when solving multiple tasks in the same environment.