6

I was working on CartPole-v0 provided by openai gym. I noticed that my program always resets after 200 steps. If I sum all the rewards from an episode, where the maximum reward is 1.0 for each timestep, I never get more than 200. I was wondering if there is any configuration I might have missed in the gymlibrary gym. Has anybody found this problem?

desert_ranger
  • 1,096
  • 3
  • 13
  • 26
Abel
  • 77
  • 1
  • 4

1 Answers1

9

CartPole-v0 gives a reward of 1.0 for every step your agent is "alive".

The environment is registered with these lines of code:

register(
    id='CartPole-v0',
    entry_point='gym.envs.classic_control:CartPoleEnv',
    max_episode_steps=200,
    reward_threshold=195.0,
)

which, in the current version of the repository, can be found here.

That max_episode_steps=200 means that an episode automatically terminates after 200 steps. So, the maximum score you can get is 200.

Dennis Soemers
  • 8,090
  • 2
  • 32
  • 55
  • 3
    I've solved with **`env._max_episode_steps = 500`** as found [here](https://github.com/openai/gym/issues/463#issuecomment-389873434). Calling **`env.reset()`** will also reset the score, so you may wanto to write a **`def env_reset():`** function as well. – Avio Apr 24 '19 at 15:38