SARSA is a model-free, on-policy reinforcement learning method.
SARSA is used in scenarios where an agent needs to learn a policy for decision-making by interacting with an environment, such as in robotics or game playing. It works by updating the action-value function based on the state-action-reward-state-action sequence, rather than just the state-action-reward-state as in Q-learning.
The key difference between SARSA and Q-learning is how the reward is calculated. While Q-learning updates the value based on the maximum possible reward of the next state, SARSA updates it based on the action actually taken according to the current policy. This means SARSA takes into account the exploration policy, not just the optimal actions.
For example, a mouse learning to navigate a maze might use SARSA to avoid dangerous paths by considering slightly sub-optimal actions that are safer, whereas Q-learning might always choose the shortest path regardless of risk.
In summary, SARSA is important for learning policies that balance exploration and exploitation, making it robust in environments where safety and risk are considerations.
- Alias
- State-Action-Reward-State-Action SARSA
- Related terms
- Reinforcement Learning Q-learning