Comparison of trained model in stable baseline 3 with another policy

Question

I want to compare a trained model in stable-baseline 3 (SB3) with another one ( a base algorithm) and see how it performs on the same episode. However I am having issues with the evaluate_policy function on the BasePolicy.

Here is a small reproducible example

import gym
from stable_baselines3 import SAC
from stable_baselines3.common.policies import BasePolicy
from stable_baselines3.common.evaluation import evaluate_policy

env = gym.make("Pendulum-v1")
env.seed(123456)
model = SAC("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=1000, log_interval=4)

reward_list, episode_list = evaluate_policy(model, model.get_env(), n_eval_episodes=10, return_episode_rewards=True)
reward_list_base, episode_list_base = evaluate_policy(BasePolicy, model.get_env(), n_eval_episodes=10, return_episode_rewards=True)

I am getting an error on the last line TypeError: BasePolicy.predict() missing 1 required positional argument: 'self'. I would like to compare the trained policy with a 'Do Nothing' policy starting from the same initial state.

Additionally one issue with this case is every time I run the evaluate_policy on the trained model, the reward_list changes even though I have fixed the environment seed.

Any help in clarifying my doubts would be appreciated.

score 0 · Answer 1 · answered Feb 17 '23 at 09:41

first, you're trying to pass a class object instead of an instance of a class. But even though it won't work because BasePolicy is an abstract class and you can't instantiate an object of this class. You need to define a child that implements _predict method.
well, SAC is stochastic in its nature meaning that its output is sampled based on probability distributions and couldn't be always the same.

Comparison of trained model in stable baseline 3 with another policy

1 Answers1