For a university project, I want to train a (simulated) robot to hit a ball given the position and velocity. The first thing to try is policy gradients: I have a parametric trajectory generator. For every training position, I feed the position through my network, send the trajectory to the simulator and get a reward back. I now can use that as the loss, sample the gradient, feed it back and update the weights of my network so that it does better next time.
Therefore, goal is to learn the mapping from position to trajectory weights. When using all-star compute graph libraries like Theano and Tensorflow (or Keras), the I have the problem that I do not know how to actually model that system. I want to have standard fully connected layers first, then the output are my trajectory weights. But how do I actually calculate the loss so that it can use the backprop?
In a custom loss function, I would ignore/not specify the true labels, run the simulator and return the loss it gives. But from what I read, you need to return a Theano/Tensorflow function which is symbolic. My loss is quite complicated, so I do not want to move it from simulator to network. How can I implement that? Problem then is to differentiate that loss, as I might need to sample to get that gradient.