0

For a university project, I want to train a (simulated) robot to hit a ball given the position and velocity. The first thing to try is policy gradients: I have a parametric trajectory generator. For every training position, I feed the position through my network, send the trajectory to the simulator and get a reward back. I now can use that as the loss, sample the gradient, feed it back and update the weights of my network so that it does better next time.

Therefore, goal is to learn the mapping from position to trajectory weights. When using all-star compute graph libraries like Theano and Tensorflow (or Keras), the I have the problem that I do not know how to actually model that system. I want to have standard fully connected layers first, then the output are my trajectory weights. But how do I actually calculate the loss so that it can use the backprop?

In a custom loss function, I would ignore/not specify the true labels, run the simulator and return the loss it gives. But from what I read, you need to return a Theano/Tensorflow function which is symbolic. My loss is quite complicated, so I do not want to move it from simulator to network. How can I implement that? Problem then is to differentiate that loss, as I might need to sample to get that gradient.

jcklie
  • 4,054
  • 3
  • 24
  • 42
  • 1
    In TensorFlow you can feed any tensor. This means that you can create a network with a dummy loss on top, and then use `feed_dict` to replace the dummy loss with your externally computed loss/direction – Yaroslav Bulatov Aug 11 '16 at 16:24

1 Answers1

1

I've had a similar problem some time ago.

There was a loss function which relied heavily on optimized C code and third-party libraries. Porting this to tensorflow was not possible.

But we still wanted to train a tensorflow graph to create steering signals from the current setup.

Here is an ipython notebook which explains how to mix numerical and analytical derivatives https://nbviewer.jupyter.org/gist/lhk/5943fa09922693a0fbbbf8dc9d1b05c0

Here is a more detailed description of the idea behind it:

The training of the graph is an optimization problem, so you will definitely need the derivative of the loss. The challenge is to mix the analytical derivative in tensorflow and the numerical derivative of your loss.

You need this setup

  • Input I
  • output P
  • Graph G maps I to P, P = G(I)
  • add a constant of the same shape as P, P = C * G(I)
  • Loss function L

Training the tensorflow graph works with backpropagation. For every parameter X in the graph, the following derivative is computed

dL/dX = dL/dP * dP/dX

The second part of that, dP/dX comes for free by just setting up the tensorflow graph. But we still need the derivative of the loss.

Now there's a trick.

We want tensorflow to update X based on the correct gradient dL/dP * dP/dX but we can't get tensorflow to compute dL/dP, because that's not a tensorflow graph.

we will instead use P~ = P * C,

the derivative of that is dP~ / dX = dP/dX * C

So if we set C to dL/dP, we get the correct gradient.

We simply have to estimate C with a numerical gradient.

This is the algorithm:

  • set up your graph, multiply the output with a constant C
  • feed 1 for the constant, compute a forward pass, get the prediction P
  • compute the loss at P
  • compute the numerical derivative of P
  • feed the numerical derivative as C, compute the backward pass, update the parameters
lhk
  • 27,458
  • 30
  • 122
  • 201