2

I considered putting this question in the actual GitHub repo for the project, but it could be a stupid question. Here is the repo: https://github.com/tensorflow/tfjs-examples/tree/master/snake-dqn

If you look in the following files and lines, you will see that it doesn't appear that the optimizer being updated has any link to the online model;

First, the online model is not compiled with any optimizer in file dqn.js lines 33-60: https://github.com/tensorflow/tfjs-examples/blob/master/snake-dqn/dqn.js#L60

Then in the following files an optimizer is initialized with no links to the model;

File agent.js line 60: https://github.com/tensorflow/tfjs-examples/blob/master/snake-dqn/agent.js#L60

File train.js line 84: https://github.com/tensorflow/tfjs-examples/blob/master/snake-dqn/train.js#L84

Optimizer used in file agent.js line 157: https://github.com/tensorflow/tfjs-examples/blob/908ee32750ba750a14d15caeb53115e2d3dda2b3/snake-dqn/agent.js#L157

I see no other references, so how does it actually update and train the online network as suggested?

Just as a side note, isn't this actually a double-dqn?

mrpetem
  • 323
  • 1
  • 8

1 Answers1

0

This is an interesting question and I was wondering the same when studying Actor Critic Tensorflow implementation.

I guess a main source of confusion may come from the difference between traditional math notation of neural network weights update and its implementation in a programming language. Books and whitepapers often depict this as an update operation towards a model as a whole. While in reality this can be narrowed down to just updating the weights.

In fact this is the only part of the model that needs to be altereted by an optimizer. That's why most optimizers are model agnostic (they don't care which model the weights are from or even if there is a model at all). An optimizer just updates variables using their gradients. And this is what you see in the line 157 of agent.js that you refer to.

However there still is a link to the model in the flow. And it's the loss function. It is used to calculate gradients with tf.variableGrads(lossFunction). And inside the loss function both this.onlineNetwork and this.targetNetwork are used to get the TD error. And here comes the mentioned notation difference: in a book when calculating a TD error (loss) you normally get numerical values. In tfjs case we get gradients as a mapping between a trainable variable and a corresponding gradient value. And this is exactly what optimizer.applyGradients requires as its argument.

You can see this if you debug this DQN example (e.g. in the Chrome Node Inspector).

Gradients Object with variable names as keys and Tensors as values

Under the hood the Tensorflow engine has all the trainable variables stored and can handle them by their names. One can see this in the Adam optimizer source code (used for this example).

And if you break just after the gradients are ready (line 166 as shown in the screenshot) you can verify that the optimizer is actually directly dealing with the model weights:

tf.engine().registeredVariables[Object.keys(grads.grads)[0]].dataSync()

and

this.onlineNetwork.getNamedWeights()[0].tensor.dataSync()

will show you the same model weights values.

So a complete chain from the model to the optimizer, if simplified, would look like: this.onlineNetwork > lossFunction > tf.variableGrads > optimizer.applyGradients

And yes, this example uses double DQN to decouple the action selection and learning target calculation.

hypers
  • 1,045
  • 1
  • 12
  • 30