Questions tagged [sarsa]

SARSA (State-Action-Reward-State-Action) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning.

Algorithm Q(s_t,a_t) \leftarrow Q(s_t,a_t) + \alpha [r_{t+1} + \gamma Q(s_{t+1}, a_{t+1})-Q(s_t,a_t)] A SARSA agent will interact with the environment and update the policy based on actions taken, known as an on-policy learning algorithm. As expressed above, the Q value for a state-action is updated by an error, adjusted by the learning rate alpha. Q values represent the possible reward received in the next time step for taking action a in state s, plus the discounted future reward received from the next state-action observation. Watkin's Q-learning was created as an alternative to the existing temporal difference technique and which updates the policy based on the maximum reward of available actions. The difference may be explained as SARSA learns the Q values associated with taking the policy it follows itself, while Watkin's Q-learning learns the Q values associated with taking the exploitation policy while following an exploration/exploitation policy. For further information on the exploration/exploitation trade off, see reinforcement learning.

Some optimizations of Watkin's Q-learning may also be applied to SARSA, for example in the paper "Fast Online Q(λ)" (Wiering and Schmidhuber, 1998) the small differences needed for SARSA(λ) implementations are described as they arise.

33 questions

146

votes

8 answers

What is the difference between Q-learning and SARSA?

Although I know that SARSA is on-policy while Q-learning is off-policy, when looking at their formulas it's hard (to me) to see any difference between these two algorithms. According to the book Reinforcement Learning: An Introduction (by Sutton and…

asked Jul 27 '11 at 17:46

Ælex

14,432
20
88
129

votes

1 answer

Eligibility trace reinitialization between episodes in SARSA-Lambda implementation

I'm looking at this SARSA-Lambda implementation (Ie: SARSA with eligibility traces) and there's a detail which I still don't get. (Image from http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node77.html) So I understand that all Q(s,a) are updated…

machine-learning reinforcement-learning sarsa

asked Apr 27 '15 at 19:25

MrD

4,986
11
48
90

votes

3 answers

Are Q-learning and SARSA with greedy selection equivalent?

The difference between Q-learning and SARSA is that Q-learning compares the current state and the best possible next state, whereas SARSA compares the current state against the actual next state. If a greedy selection policy is used, that is, the…

reinforcement-learning q-learning sarsa

asked Sep 29 '15 at 14:13

Mouscellaneous

2,584
3
27
37

votes

1 answer

Episodic Semi-gradient Sarsa with Neural Network

While trying to implement the Episodic Semi-gradient Sarsa with a Neural Network as the approximator I wondered how I choose the optimal action based on the currently learned weights of the network. If the action space is discrete I can just…

neural-network reinforcement-learning sarsa

asked Jul 28 '17 at 15:35

zimmerrol

4,872
3
22
41

votes

1 answer

Why is there no n-step Q-learning algorithm in Sutton's RL book?

I think I am messing something up. I always thought that: - 1-step TD on-policy = Sarsa - 1-step TD off-policy = Q-learning Thus I conclude: - n-step TD on-policy = n-step Sarsa - n-step TD off-policy = n-step Q-learning In Sutton's book,…

reinforcement-learning q-learning sarsa

asked Apr 13 '18 at 17:10

siva

1,183
3
12
28

votes

1 answer

Understanding linear, gradient-descent Sarsa (based on Sutton & Barto)

I'm trying to implement linear gradient-descent Sarsa based on Sutton & Barto's Book, see the algorithm in the picture below. However, I struggle to understand something in the algorithm: Is the dimension of w and z independent of how many…

reinforcement-learning sarsa

asked Nov 21 '16 at 14:37

bbiegel

votes

1 answer

SARSA value approximation for Cart Pole

I have a question on this SARSA FA. In input cell 142 I see this modified update w += alpha * (reward - discount * q_hat_next) * q_hat_grad where q_hat_next is Q(S', a') and q_hat_grad is the derivative of Q(S, a) (assume S, a, R, S' a'…

machine-learning reinforcement-learning openai-gym sarsa

asked Jul 17 '18 at 01:26

Chuk Lee

3,570
22
19

votes

1 answer

Sarsa algorithm, why Q-values tend to zero?

I'm trying to implement Sarsa algorithm for solving a Frozen Lake environment from OpenAI gym. I've started soon to work with this but I think I understand it. I also understand how Sarsa algorithm works, there're many sites where to find a…

python reinforcement-learning sarsa

asked Oct 12 '16 at 19:23

CSR95

votes

1 answer

Deep Neural Network combined with qlearning

I'm using joint positions from a Kinect camera as my state space but I think it's going to be too large (25 joints x 30 per second) to just feed into SARSA or Qlearning. Right now I'm using the Kinect Gesture Builder program which uses Supervised…

deep-learning reinforcement-learning accord.net q-learning sarsa

asked Dec 12 '15 at 23:00

ORobotics

votes

0 answers

Implementing Eligibility Traces in SARSA

I am writing a MATLAB implemention of the SARSA algorithm, and have successfully writtena one-step implementation. I am now trying to extend it to use eligibility traces, but the results I obtain are worse than with one-step. (Ie: The algorithm…

algorithm matlab reinforcement-learning sarsa

asked May 02 '15 at 14:32

MrD

4,986
11
48
90

votes

1 answer

SARSA Implementation

I am learning about SARSA algorithm implementation and had a question. I understand that the general "learning" step takes the form of: Robot (r) is in state s. There are four actions available: North (n), East (e), West (w) and South (s) such…

machine-learning sarsa

asked Apr 26 '15 at 14:54

MrD

4,986
11
48
90

votes

0 answers

Implementing Sarsa(lambda) - Gridworld - in Julia language

Could you explain me what is wrong in this code ? I am trying to implement SARSA(lamda) with eligibility traces. using ReinforcementLearningBase, GridWorlds using PyPlot world = GridWorlds.GridRoomsDirectedModule.GridRoomsDirected(); env =…

julia reinforcement-learning gridworld sarsa

asked Dec 26 '22 at 23:29

przel123

votes

0 answers

is this true ? what about Expected SARSA and double Q-Learning?

I‘m studying Reinforcement Learning and I’m facing a problem understanding the difference between SARSA, Q-Learning, expected SARSA, Double Q Learning and temporal difference. Can you please explain the difference and tell me when to use each? And…

reinforcement-learning q-learning sarsa temporal-difference

asked Mar 27 '19 at 19:38

Cooper

votes

1 answer

Sarsa and Q Learning (reinforcement learning) don't converge optimal policy

I have a question about my own project for testing reinforcement learning technique. First let me explain you the purpose. I have an agent which can take 4 actions during 8 steps. At the end of this eight steps, the agent can be in 5 possible…

reinforcement-learning q-learning sarsa

asked Oct 11 '18 at 07:01

T.L

votes

1 answer

Sarsa with neural network to solve the Mountain Car Task

I am trying to implement the Episodic Semi-gradient Sarsa for Estimating q described in Sutton's book to solve the Mountain Car Task. To approximate q I want to use a neural network. Therefore, I came up with this code. But sadly my agent is not…

reinforcement-learning sarsa

asked Jul 29 '17 at 15:24

zimmerrol

4,872
3
22
41

2 3 Next