0

So, I created a custom environment based on gymnasium and I want to train it with PPO from stable_baselines3. I'm using version 2.0.0a5 of the latter, in order to use gymnasium. I have the following code:

env = MyEnv()
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=1, progress_bar=True)

This code does not stop, the progress bar goes over the total number of time steps and just goes on... I may be doing something wrong with the environment but I am not sure what and why it would mean that the learning process makes more iterations than the total_timesteps fixed by the user.

So, what could go wrong with the environment? What should I check that could make the learning process infinite?

Edit: the plot thickens. I tried the same thing with an SAC agent and it does not go into an infinite loop during learning. But it does one during evaluation!

Benares
  • 1,186
  • 1
  • 7
  • 13
  • maybe what you've seen was rollouts collecting. In this case it will perform `n_steps` rollouts, by default this value is 2048. Only after that it trains and immediately quits as `total_timesteps=1` – gehirndienst Apr 20 '23 at 11:46

0 Answers0