Questions tagged [temporal-difference]

Temporal difference (TD) learning is a prediction method which has been mostly used for solving the reinforcement learning problem.

Temporal-difference (TD) is a combination of Monte Carlo ideas and dynamic programming ideas. By approximating the current estimate policy, TD is related to dynamic programming. In addition, by sampling the environment according to some policy, it is related to Monte Carlo methods. Temporal-difference is a form of bootstrapping, as illustrated with the following example: Suppose you wish to predict the weather for Saturday, and you have some model that predicts Saturday's weather, given the weather of each day in the week. In the standard case, you would wait until Saturday and then adjust all your models. However, when it is, for example, Friday, you should have a pretty good idea of what the weather would be on Saturday - and thus be able to change, say, Monday's model before Saturday arrives.

The TD algorithm has also received attention in the field of neuroscience. TD(lambda) is created by R. Sutton. A good starting point to learn about temporal-difference can be found here.

40 questions
23
votes
1 answer

Q-learning vs temporal-difference vs model-based reinforcement learning

I'm in a course called "Intelligent Machines" at the university. We were introduced with 3 methods of reinforced learning, and with those we were given the intuition of when to use them, and I quote: Q-Learning - Best when MDP can't be solved.…
14
votes
1 answer

Implementing the TD-Gammon algorithm

I am attempting to implement the algorithm from the TD-Gammon article by Gerald Tesauro. The core of the learning algorithm is described in the following paragraph: I have decided to have a single hidden layer (if that was enough to play…
9
votes
3 answers

Stuck in understanding the difference between update usels of TD(0) and TD(λ)

I'm studying Temporal difference learning from this post. Here the update rule of TD(0) is clear to me but in TD(λ), I don't understand how utility values of all the previous states are updated in a single update. Here is the diagram given for…
6
votes
1 answer

Neural Network and Temporal Difference Learning

I have a read few papers and lectures on temporal difference learning (some as they pertain to neural nets, such as the Sutton tutorial on TD-Gammon) but I am having a difficult time understanding the equations, which leads me to my questions.…
5
votes
0 answers

Neural Network Reinforcement Learning Requiring Next-State Propagation For Backpropagation

I am attempting to construct a neural network incorporating convolution and LSTM (using the Torch library) to be trained by Q-learning or Advantage-learning, both of which require propagating state T+1 through the network before updating the weights…
4
votes
1 answer

Is Monte Carlo learning policy or value iteration (or something else)?

I am taking a Reinforcement Learning class and I didn’t understand how to combine the concepts of policy iteration/value iteration with Monte Carlo (and also TD/SARSA/Q-learning). In the table below, how can the empty cells be filled: Should/can it…
4
votes
3 answers

TD(λ) in Delphi/Pascal (Temporal Difference Learning)

I have an artificial neural network which plays Tic-Tac-Toe - but it is not complete yet. What I have yet: the reward array "R[t]" with integer values for every timestep or move "t" (1=player A wins, 0=draw, -1=player B wins) The input values are…
4
votes
1 answer

How to choose action in TD(0) learning

I am currently reading Sutton's Reinforcement Learning: An introduction book. After reading chapter 6.1 I wanted to implement a TD(0) RL algorithm for this setting: To do this, I tried to implement the pseudo-code presented here: Doing this I…
zimmerrol
  • 4,872
  • 3
  • 22
  • 41
4
votes
4 answers

TD learning vs Q learning

In a perfect information environment, where we are able to know the state after an action, like playing chess, is there any reason to use Q learning not TD (temporal difference) learning? As far as I understand, TD learning will try to learn…
4
votes
1 answer

Updates in Temporal Difference Learning

I read about Tesauro's TD-Gammon program and would love to implement it for tic tac toe, but almost all of the information is inaccessible to me as a high school student because I don't know the terminology. The first equation here,…
3
votes
1 answer

Implementing a loss function (MSVE) in Reinforcement learning

I am trying to build a temporal difference learning agent for Othello. While the rest of my implementation seems to run as intended I am wondering about the loss function used to train my network. In Sutton's book "Reinforcement learning: An…
3
votes
2 answers

Analysis over time comparing 2 dataframes row by row

This is a small portion of the dataframe I am working with for reference.I am working with a data frame (MG53_HanLab) in R that has a column for Time, several columns with the name "MG53" in them, several columns with the name "F2" and several with…
Kristyn
  • 33
  • 5
2
votes
0 answers

How do you create an optimizer for the TD-Lambda method in Tensorflow 2.0?

I am trying to implement TD-Gammon, as described in this paper, which uses the TD-Lambda learning algorithm . This has been done already here, but it is 4 years old and doesn't use Tensorflow 2. I am trying to do this in Tensorflow 2 and think I…
2
votes
0 answers

is this true ? what about Expected SARSA and double Q-Learning?

I‘m studying Reinforcement Learning and I’m facing a problem understanding the difference between SARSA, Q-Learning, expected SARSA, Double Q Learning and temporal difference. Can you please explain the difference and tell me when to use each? And…
2
votes
1 answer

Reinforcement Learning: Q and Q(λ) speed difference on Windy Grid World environment

Preface: I have attempted to solve this Windy-Grid-World env. Having implemented both Q and Q(λ) algorithm, the results are pretty much the same (I am looking at steps per episode). Problem: From what I have read, I believe that a higher lambda…
1
2 3