2

I'm new to Reinforcement Learning. Recently, I've been trying to train a Deep Q Network to solve OpenAI gym's CartPole-v0 , where solving means achieving an average score of at least 195.0 over 100 consecutive episodes.

I am using a 2 layer neural network, experience replay with the memory containing 1 million experiences, epsilon greedy policy, RMSProp optimizer and Huber loss function.

With this setting, solving the task is taking several thousand episodes (> 30k). Learning is also quite unstable at times. So, is it normal for Deep Q Networks to oscillate and take this long for learning a task like this? What other alternatives (or improvements on my DQN) can give better results?

W. Hawk
  • 51
  • 6
  • Here you can find a tutorial that probably can be helpful for your purposes. The tutorial uses OpenAI CartPole problem, and they use a neural network like you: https://pythonprogramming.net/openai-cartpole-neural-network-example-machine-learning-tutorial/ – Pablo EM Mar 14 '17 at 09:28
  • Thank you, @PabloEM. It is giving me some new insights. – W. Hawk Mar 20 '17 at 04:50
  • 1
    Great. In general, I guess Deep Q Learning is somehow overkill to solve the Cartpole task. – Pablo EM Mar 20 '17 at 08:07
  • How many training steps are 30k episodes? DQN usually takes long to converge. Are you already using a target network? – BlueSun May 06 '17 at 12:38
  • Training steps per each episode greatly varied, so it's not easy to say how many training steps were taken over 30k episodes. And yes, I was using a target network. – W. Hawk May 07 '17 at 02:38

1 Answers1

2

What other alternatives (or improvements on my DQN) can give better results?

in my experience, policy gradients work well with the cartpole. also, they are fairly easy to implement (if you squint, policy gradients almost look like supervised learning).

a good place to start: http://kvfrans.com/simple-algoritms-for-solving-cartpole/

mynameisvinn
  • 341
  • 4
  • 10