How to stop the learning process with PPO in stablelines?

Asked Apr 19 '23 at 10:47

Active Apr 19 '23 at 12:25

Viewed 57 times

So, I created a custom environment based on gymnasium and I want to train it with PPO from stable_baselines3. I'm using version 2.0.0a5 of the latter, in order to use gymnasium. I have the following code:

env = MyEnv()
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=1, progress_bar=True)

This code does not stop, the progress bar goes over the total number of time steps and just goes on... I may be doing something wrong with the environment but I am not sure what and why it would mean that the learning process makes more iterations than the total_timesteps fixed by the user.

So, what could go wrong with the environment? What should I check that could make the learning process infinite?

Edit: the plot thickens. I tried the same thing with an SAC agent and it does not go into an infinite loop during learning. But it does one during evaluation!

edited Apr 19 '23 at 12:25

asked Apr 19 '23 at 10:47

Benares

1,186
1
7
13

maybe what you've seen was rollouts collecting. In this case it will perform `n_steps` rollouts, by default this value is 2048. Only after that it trains and immediately quits as `total_timesteps=1` – gehirndienst Apr 20 '23 at 11:46

How to stop the learning process with PPO in stablelines?

0 Answers0