Hyperparameter search for lunarlander continuous of openAI gym

Asked Feb 27 '21 at 03:29

Active Mar 03 '21 at 08:48

Viewed 268 times

I'm trying to solve the LunarLander continuous environment from open AI gym (Solving the LunarLanderContinuous-v2 means getting an average reward of 200 over 100 consecutive trials.) With best reward average possible for 100 straight episodes from this environment. The difficulty is that I refer to the Lunar-lander with uncertainty. (explanation: observations in the real physical world are sometimes noisy). Specifically, I add a zero-mean Gaussian noise with mean=0 and std = 0.05 to PositionX and PositionY observation of the location of the lander. I also discretise the LunarLander actions to a finite number of actions instead of the continuous range the environment enables.

So far I'm using DQN, double-DQN and Duelling DDQN.

My hyperparameters are:

gamma,
epsilon start
epsilon end
epsilon decay
learning rate
number of actions (discretisation)
target update
batch size
optimizer
number of episodes
network architecture.

I'm having difficulty to reach good or even mediocre results. Does someone have an advice about the hyperparameters changes I should make to improve my results? Thanks!

edited Mar 03 '21 at 08:48

nsidn98

1,037
1
9
23

asked Feb 27 '21 at 03:29

user309678

Hyperparameter search for lunarlander continuous of openAI gym

0 Answers0