I'm trying to solve the LunarLander continuous environment from open AI gym (Solving the LunarLanderContinuous-v2 means getting an average reward of 200 over 100 consecutive trials.) With best reward average possible for 100 straight episodes from this environment. The difficulty is that I refer to the Lunar-lander with uncertainty. (explanation: observations in the real physical world are sometimes noisy). Specifically, I add a zero-mean Gaussian noise with mean=0 and std = 0.05 to PositionX and PositionY observation of the location of the lander. I also discretise the LunarLander actions to a finite number of actions instead of the continuous range the environment enables.
So far I'm using DQN, double-DQN and Duelling DDQN.
My hyperparameters are:
- gamma,
- epsilon start
- epsilon end
- epsilon decay
- learning rate
- number of actions (discretisation)
- target update
- batch size
- optimizer
- number of episodes
- network architecture.
I'm having difficulty to reach good or even mediocre results. Does someone have an advice about the hyperparameters changes I should make to improve my results? Thanks!