2

I'm trying to solve the LunarLander continuous environment from open AI gym (Solving the LunarLanderContinuous-v2 means getting an average reward of 200 over 100 consecutive trials.) With best reward average possible for 100 straight episodes from this environment. The difficulty is that I refer to the Lunar-lander with uncertainty. (explanation: observations in the real physical world are sometimes noisy). Specifically, I add a zero-mean Gaussian noise with mean=0 and std = 0.05 to PositionX and PositionY observation of the location of the lander. I also discretise the LunarLander actions to a finite number of actions instead of the continuous range the environment enables.

So far I'm using DQN, double-DQN and Duelling DDQN.

My hyperparameters are:

  • gamma,
  • epsilon start
  • epsilon end
  • epsilon decay
  • learning rate
  • number of actions (discretisation)
  • target update
  • batch size
  • optimizer
  • number of episodes
  • network architecture.

I'm having difficulty to reach good or even mediocre results. Does someone have an advice about the hyperparameters changes I should make to improve my results? Thanks!

nsidn98
  • 1,037
  • 1
  • 9
  • 23
user309678
  • 23
  • 5

0 Answers0