0

In the sample code from the stable baselines3 website (https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html), the model first will learn via model.learn(total_timesteps=25000) line, then it can be used in the playing loop.

Now, as I want to be able to monitor different parameters (from a custom env) while the agent is progressing in its learning, my question is: How can I use model.learn inside the playing loop?

import gym
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env

# Parallel environments
env = make_vec_env("CartPole-v1", n_envs=4)

model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=25000)
model.save("ppo_cartpole")

del model # remove to demonstrate saving and loading

model = PPO.load("ppo_cartpole")

obs = env.reset()
while True:
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()
mac179
  • 1,540
  • 1
  • 14
  • 24

1 Answers1

0

The playing loop for training contains a lot of various operations that are required for specific algorithm (e.g. PPO). Such playing loops are called Rollouts. You can find rollout function collect_rollouts at stable_baselines3.common.on_policy_algorithm.OnPolicyAlgorithm. So it is better not to write your own training loop if it is done for you in the framework.

To track various parameters (including custom parameters) you can look at Callbacks (https://stable-baselines3.readthedocs.io/en/master/guide/callbacks.html). This can be included inside model.learn(timestamps=25000, callback=custom_callback). Moreover, if you just want to play with learned model, you can use evaluation function instead of learning with the same callbacks for tracking of parameters:

from stable_baselines3.common.evaluation import evaluate_policy
...
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=25000)
evaluate_policy(model.policy, env, n_eval_episodes=10, deterministic=True, callback=custom_callback)
Mikhail
  • 395
  • 3
  • 17