When a model learns there is:
- A rollout phase
- A learning phase
My models are rolling out but they never show a learning phase. This is apparent both in the text output in a jupyter Notebook
in vscode
as well as in tensorboard
.
I built a very simple environment
and tried many more timesteps. What I discovered was:
If there are too few timesteps, the model never displays that it learns
- What is the minimum number of timesteps to learn?
- Is this the same for all environments or does it depend upon your environment?
import time
tic = time.perf_counter()
log_path = os.path.join('Training', 'Logs')
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log=log_path)
modelRtn = model.learn(total_timesteps=1000, progress_bar=True)
toc = time.perf_counter()
print("Elapsed time: " + str(toc-tic) + " sec")