1

I implemented a dqn agent and after some hours of learning the reward is steady on 20-21.
When i want to see the agent play I can see that the same move is played again and again. the env on reset always shoots the ball in the same direction and my agent learned to play that exact move and never loose.
Is this the behavior of gym pong env? how can i make the env reset more random? I'm using the NoopResetEnv wrapper it doesn't help!

Vikrant
  • 4,920
  • 17
  • 48
  • 72
yonigo
  • 987
  • 1
  • 15
  • 30

1 Answers1

0

The agent acts in a same way can be tracked from two reasons: the model itself and the pong env.

For the model, in case you are training a DQN model, the vanilla DQN model actually is a deterministic model which means it will give the same action based on the same situation. What you can try is to set a little 'randomness' for the model such like use 0.1 probability to get action randomly. For example in stable baselines you can choose to predict in a deterministic behavior by setting 'deterministic' to True.

As the perspective of env, I have not tried by myself but there is a seed parameter in openai gym atari env you can set seed for the openai gym atari env (env.seed(your_seed)). Check here and github for more information.

vwaq
  • 11
  • 1