Questions tagged [temporal-difference]

Temporal difference (TD) learning is a prediction method which has been mostly used for solving the reinforcement learning problem.

Temporal-difference (TD) is a combination of Monte Carlo ideas and dynamic programming ideas. By approximating the current estimate policy, TD is related to dynamic programming. In addition, by sampling the environment according to some policy, it is related to Monte Carlo methods. Temporal-difference is a form of bootstrapping, as illustrated with the following example: Suppose you wish to predict the weather for Saturday, and you have some model that predicts Saturday's weather, given the weather of each day in the week. In the standard case, you would wait until Saturday and then adjust all your models. However, when it is, for example, Friday, you should have a pretty good idea of what the weather would be on Saturday - and thus be able to change, say, Monday's model before Saturday arrives.

The TD algorithm has also received attention in the field of neuroscience. TD(lambda) is created by R. Sutton. A good starting point to learn about temporal-difference can be found here.

40 questions

votes

1 answer

Q-learning vs temporal-difference vs model-based reinforcement learning

I'm in a course called "Intelligent Machines" at the university. We were introduced with 3 methods of reinforced learning, and with those we were given the intuition of when to use them, and I quote: Q-Learning - Best when MDP can't be solved.…

asked Dec 09 '15 at 14:17

StationaryTraveller

1,449
2
19
31

votes

1 answer

Implementing the TD-Gammon algorithm

I am attempting to implement the algorithm from the TD-Gammon article by Gerald Tesauro. The core of the learning algorithm is described in the following paragraph: I have decided to have a single hidden layer (if that was enough to play…

python artificial-intelligence reinforcement-learning temporal-difference

asked Aug 08 '19 at 20:16

Arthur

votes

3 answers

Stuck in understanding the difference between update usels of TD(0) and TD(λ)

I'm studying Temporal difference learning from this post. Here the update rule of TD(0) is clear to me but in TD(λ), I don't understand how utility values of all the previous states are updated in a single update. Here is the diagram given for…

machine-learning reinforcement-learning temporal-difference

asked Sep 02 '18 at 10:45

Kaushal28

5,377
5
41
72

votes

1 answer

Neural Network and Temporal Difference Learning

I have a read few papers and lectures on temporal difference learning (some as they pertain to neural nets, such as the Sutton tutorial on TD-Gammon) but I am having a difficult time understanding the equations, which leads me to my questions.…

artificial-intelligence neural-network backpropagation reinforcement-learning temporal-difference

asked Apr 23 '14 at 05:07

ethnhll

votes

0 answers

Neural Network Reinforcement Learning Requiring Next-State Propagation For Backpropagation

I am attempting to construct a neural network incorporating convolution and LSTM (using the Torch library) to be trained by Q-learning or Advantage-learning, both of which require propagating state T+1 through the network before updating the weights…

neural-network reinforcement-learning torch lstm temporal-difference

asked Aug 18 '15 at 21:03

JAKJ

votes

1 answer

Is Monte Carlo learning policy or value iteration (or something else)?

I am taking a Reinforcement Learning class and I didn’t understand how to combine the concepts of policy iteration/value iteration with Monte Carlo (and also TD/SARSA/Q-learning). In the table below, how can the empty cells be filled: Should/can it…

reinforcement-learning q-learning temporal-difference monte-carlo-tree-search value-iteration

asked May 07 '18 at 18:28

Johan

votes

3 answers

TD(λ) in Delphi/Pascal (Temporal Difference Learning)

I have an artificial neural network which plays Tic-Tac-Toe - but it is not complete yet. What I have yet: the reward array "R[t]" with integer values for every timestep or move "t" (1=player A wins, 0=draw, -1=player B wins) The input values are…

artificial-intelligence neural-network reinforcement-learning temporal-difference

asked Jan 30 '11 at 20:59

caw

30,999
61
181
291

votes

1 answer

How to choose action in TD(0) learning

I am currently reading Sutton's Reinforcement Learning: An introduction book. After reading chapter 6.1 I wanted to implement a TD(0) RL algorithm for this setting: To do this, I tried to implement the pseudo-code presented here: Doing this I…

reinforcement-learning temporal-difference

asked Jul 21 '17 at 07:23

zimmerrol

4,872
3
22
41

votes

4 answers

TD learning vs Q learning

In a perfect information environment, where we are able to know the state after an action, like playing chess, is there any reason to use Q learning not TD (temporal difference) learning? As far as I understand, TD learning will try to learn…

machine-learning reinforcement-learning q-learning temporal-difference

asked Feb 26 '16 at 11:29

Ricky

votes

1 answer

Updates in Temporal Difference Learning

I read about Tesauro's TD-Gammon program and would love to implement it for tic tac toe, but almost all of the information is inaccessible to me as a high school student because I don't know the terminology. The first equation here,…

machine-learning tic-tac-toe reinforcement-learning temporal-difference

asked May 22 '12 at 05:20

Site

votes

1 answer

Implementing a loss function (MSVE) in Reinforcement learning

I am trying to build a temporal difference learning agent for Othello. While the rest of my implementation seems to run as intended I am wondering about the loss function used to train my network. In Sutton's book "Reinforcement learning: An…

reinforcement-learning pytorch loss-function temporal-difference othello

asked Oct 11 '17 at 10:10

masus04

votes

2 answers

Analysis over time comparing 2 dataframes row by row

This is a small portion of the dataframe I am working with for reference.I am working with a data frame (MG53_HanLab) in R that has a column for Time, several columns with the name "MG53" in them, several columns with the name "F2" and several with…

r dataframe subset temporal-difference

asked Jul 17 '17 at 17:01

Kristyn

votes

0 answers

How do you create an optimizer for the TD-Lambda method in Tensorflow 2.0?

I am trying to implement TD-Gammon, as described in this paper, which uses the TD-Lambda learning algorithm . This has been done already here, but it is 4 years old and doesn't use Tensorflow 2. I am trying to do this in Tensorflow 2 and think I…

python tensorflow reinforcement-learning temporal-difference

asked Feb 14 '20 at 15:58

kman99

votes

0 answers

is this true ? what about Expected SARSA and double Q-Learning?

I‘m studying Reinforcement Learning and I’m facing a problem understanding the difference between SARSA, Q-Learning, expected SARSA, Double Q Learning and temporal difference. Can you please explain the difference and tell me when to use each? And…

reinforcement-learning q-learning sarsa temporal-difference

asked Mar 27 '19 at 19:38

Cooper

votes

1 answer

Reinforcement Learning: Q and Q(λ) speed difference on Windy Grid World environment

Preface: I have attempted to solve this Windy-Grid-World env. Having implemented both Q and Q(λ) algorithm, the results are pretty much the same (I am looking at steps per episode). Problem: From what I have read, I believe that a higher lambda…

python lambda reinforcement-learning q-learning temporal-difference

asked Jan 07 '18 at 23:36

Vinh Vu

2 3 Next