I was working on CartPole-v0 provided by openai gym. I noticed that my program always resets after 200 steps. If I sum all the rewards from an episode, where the maximum reward is 1.0 for each timestep, I never get more than 200. I was wondering if there is any configuration I might have missed in the gymlibrary gym. Has anybody found this problem?
Asked
Active
Viewed 3,285 times
1 Answers
9
CartPole-v0
gives a reward of 1.0
for every step your agent is "alive".
The environment is registered with these lines of code:
register(
id='CartPole-v0',
entry_point='gym.envs.classic_control:CartPoleEnv',
max_episode_steps=200,
reward_threshold=195.0,
)
which, in the current version of the repository, can be found here.
That max_episode_steps=200
means that an episode automatically terminates after 200
steps. So, the maximum score you can get is 200
.

Dennis Soemers
- 8,090
- 2
- 32
- 55
-
3I've solved with **`env._max_episode_steps = 500`** as found [here](https://github.com/openai/gym/issues/463#issuecomment-389873434). Calling **`env.reset()`** will also reset the score, so you may wanto to write a **`def env_reset():`** function as well. – Avio Apr 24 '19 at 15:38