Questions tagged [sarsa]

SARSA (State-Action-Reward-State-Action) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning.

SARSA (State-Action-Reward-State-Action) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning.

Algorithm Q(s_t,a_t) \leftarrow Q(s_t,a_t) + \alpha [r_{t+1} + \gamma Q(s_{t+1}, a_{t+1})-Q(s_t,a_t)] A SARSA agent will interact with the environment and update the policy based on actions taken, known as an on-policy learning algorithm. As expressed above, the Q value for a state-action is updated by an error, adjusted by the learning rate alpha. Q values represent the possible reward received in the next time step for taking action a in state s, plus the discounted future reward received from the next state-action observation. Watkin's Q-learning was created as an alternative to the existing temporal difference technique and which updates the policy based on the maximum reward of available actions. The difference may be explained as SARSA learns the Q values associated with taking the policy it follows itself, while Watkin's Q-learning learns the Q values associated with taking the exploitation policy while following an exploration/exploitation policy. For further information on the exploration/exploitation trade off, see reinforcement learning.

Some optimizations of Watkin's Q-learning may also be applied to SARSA, for example in the paper "Fast Online Q(λ)" (Wiering and Schmidhuber, 1998) the small differences needed for SARSA(λ) implementations are described as they arise.

33 questions
0
votes
1 answer

Implementing SARSA in Unity

So I've used following code to implement Q-learning in Unity: using System; using System.Collections; using System.Collections.Generic; using System.Linq; using UnityEngine; namespace QLearner { public class QLearnerScript { …
0
votes
1 answer

Zeta Variable of SARSA(lamda)

What does zeta represent in the critic method? I believe it keeps track of the state-action pairs and represents eligibility traces, which are a temporary record of the state-actions, but what exactly does zeta represent and how does it look in c++…
anon
  • 543
  • 4
  • 15
0
votes
1 answer

How to prevent the eligibility trace in SARSA with lambda = 1 from exploding for state-action pairs that are visited a huge number of times?

I was testing SARSA with lambda = 1 with Windy Grid World and if the exploration causes the same state-action pair to be visited many times before reaching the goal, the eligibility trace gets incremented each time without any decay, therefore it…
1 2
3