In the sample code from the stable baselines3 website (https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html), the model first will learn via model.learn(total_timesteps=25000)
line, then it can be used in the playing loop.
Now, as I want to be able to monitor different parameters (from a custom env) while the agent is progressing in its learning, my question is: How can I use model.learn
inside the playing loop?
import gym
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
# Parallel environments
env = make_vec_env("CartPole-v1", n_envs=4)
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=25000)
model.save("ppo_cartpole")
del model # remove to demonstrate saving and loading
model = PPO.load("ppo_cartpole")
obs = env.reset()
while True:
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
env.render()