0

I want to compare a trained model in stable-baseline 3 (SB3) with another one ( a base algorithm) and see how it performs on the same episode. However I am having issues with the evaluate_policy function on the BasePolicy.

Here is a small reproducible example

import gym
from stable_baselines3 import SAC
from stable_baselines3.common.policies import BasePolicy
from stable_baselines3.common.evaluation import evaluate_policy

env = gym.make("Pendulum-v1")
env.seed(123456)
model = SAC("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=1000, log_interval=4)

reward_list, episode_list = evaluate_policy(model, model.get_env(), n_eval_episodes=10, return_episode_rewards=True)
reward_list_base, episode_list_base = evaluate_policy(BasePolicy, model.get_env(), n_eval_episodes=10, return_episode_rewards=True)

I am getting an error on the last line TypeError: BasePolicy.predict() missing 1 required positional argument: 'self'. I would like to compare the trained policy with a 'Do Nothing' policy starting from the same initial state.

Additionally one issue with this case is every time I run the evaluate_policy on the trained model, the reward_list changes even though I have fixed the environment seed.

Any help in clarifying my doubts would be appreciated.

APaul31
  • 38
  • 5

1 Answers1

0
  1. first, you're trying to pass a class object instead of an instance of a class. But even though it won't work because BasePolicy is an abstract class and you can't instantiate an object of this class. You need to define a child that implements _predict method.

  2. well, SAC is stochastic in its nature meaning that its output is sampled based on probability distributions and couldn't be always the same.

gehirndienst
  • 424
  • 2
  • 13