StableBaselines3 / PPO / model rollsout but does not learn?

Question

When a model learns there is:

A rollout phase
A learning phase

My models are rolling out but they never show a learning phase. This is apparent both in the text output in a jupyter Notebook in vscode as well as in tensorboard.

I built a very simple environment and tried many more timesteps. What I discovered was:

If there are too few timesteps, the model never displays that it learns

What is the minimum number of timesteps to learn?
Is this the same for all environments or does it depend upon your environment?

import time

tic = time.perf_counter()

log_path = os.path.join('Training', 'Logs')
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log=log_path)
modelRtn = model.learn(total_timesteps=1000, progress_bar=True)

toc = time.perf_counter()
print("Elapsed time:  " + str(toc-tic) + " sec")

score 1 · Accepted Answer · answered Feb 13 '23 at 08:49

1

Your PPO has n_steps parameter that is 2048 by default. collect_rollouts fills the buffer until 2049-th iteration, then an execution returns to your learn method and stops immediately by reaching a limit of timesteps because you set only 1000 for the whole learning.

answered Feb 13 '23 at 08:49

gehirndienst

424
2
13

How did you figure that out? The `learn` method does not appear to have this in the documentation: https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html I see that PPO has a method for `collect_rollouts`. So, you can't set this value in `learn`... you must set it in `collect_rollouts`? What is the best way for me to learn these details? I certainly want to learn faster than an AI algorithm... – user3533030 Feb 14 '23 at 15:17

StableBaselines3 / PPO / model rollsout but does not learn?

1 Answers1