3

I'm using joint positions from a Kinect camera as my state space but I think it's going to be too large (25 joints x 30 per second) to just feed into SARSA or Qlearning.

Right now I'm using the Kinect Gesture Builder program which uses Supervised Learning to associate user movement to specific gestures. But that requires supervised training which I'd like to move away from. I figure the algorithm might pick up certain associations between joints that I would when I classify the data myself (hands up, step left, step right, for example).

I think feeding that data into a deep neural network and then pass that into a reinforcement learning algorithm might give me a better result.

There was a paper on this recently. https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

I know Accord.net has both deep neural networks and RL but has anyone combined them together? Any insights?

ORobotics
  • 31
  • 1
  • What's your goal? What are your actions and the reward? – Simon Dec 14 '15 at 09:31
  • This is for a boxing robot. The user stands in front of it and fights it. If the robot punches the user it's a positive reward, if the user punches the robot it's a negative reward. The actions are sequences of punches that I define (action 1 might be left straight, right straight, left hook for example). – ORobotics Dec 17 '15 at 22:17

1 Answers1

0

If I understand correctly from your question + comment, what you want is to have an agent that performs discrete actions using a visual input (raw pixels from a camera). This looks exactly like what DeepMind guys recently did, extending the paper you mentioned. Have a look at this. It is the newer (and better) version of playing Atari games. They also provide an official implementation, which you can download here. There is even an implementation in Neon which works pretty well.

Finally, if you want to use continuous actions, you might be interested in this very recent paper.

To recap: yes, somebody combined DNN + RL, it works and if you want to use raw camera data to train an agent with RL, this is definitely one way to go :)

Simon
  • 5,070
  • 5
  • 33
  • 59
  • I'm familiar with that paper and is part of my inspiration for using Deep QLearning. Unlike the paper I'm not planning on raw pixels. I'm going to use the coordinates for each of the 25 joints from the Kinect. The Kinect SDK does a great job using the depth sensor to recognize joints so no need to re-create that functionality. In the last paper you linked to (thanks for that!) it says "However, while DQN solves problems with high-dimensional observation spaces, it can only handle discrete and low-dimensional action spaces." – ORobotics Dec 19 '15 at 00:26
  • I have a relatively low action space (probably 20 on the high side) so I think the standard DQN will work. Any advice or has anyone posted any code on how to do this in Accord.net? I'd rather not have to write some of that from scratch. – ORobotics Dec 19 '15 at 00:27
  • Yes, DQN is able to deal with only discrete action spaces and 20 is still low-dimensional, so you will be fine. I don't have any experience with Accord.net though, so cannot help regarding that. Anyway, implementing it should be quite straightforward, it is nothing really complex in the end. – Simon Dec 19 '15 at 01:54