I want to compare a trained model in stable-baseline 3 (SB3) with another one ( a base algorithm) and see how it performs on the same episode. However I am having issues with the evaluate_policy
function on the BasePolicy
.
Here is a small reproducible example
import gym
from stable_baselines3 import SAC
from stable_baselines3.common.policies import BasePolicy
from stable_baselines3.common.evaluation import evaluate_policy
env = gym.make("Pendulum-v1")
env.seed(123456)
model = SAC("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=1000, log_interval=4)
reward_list, episode_list = evaluate_policy(model, model.get_env(), n_eval_episodes=10, return_episode_rewards=True)
reward_list_base, episode_list_base = evaluate_policy(BasePolicy, model.get_env(), n_eval_episodes=10, return_episode_rewards=True)
I am getting an error on the last line TypeError: BasePolicy.predict() missing 1 required positional argument: 'self'
. I would like to compare the trained policy with a 'Do Nothing' policy starting from the same initial state.
Additionally one issue with this case is every time I run the evaluate_policy
on the trained model, the reward_list changes even though I have fixed the environment seed.
Any help in clarifying my doubts would be appreciated.