0

I´m trying to use this code from a repo in GitHub (https://github.com/nicknochnack/Reinforcement-Learning-for-Trading-Custom-Signals/blob/main/Custom%20Signals.ipynb) in Point 3:

model = A2C('MlpLstmPolicy', env, verbose=1)
model.learn(total_timesteps=1000000)

I got a lot of problems with stable-baselines for a different line so I tried with stable-baselines3 But I think that MlpLstmPolicy doesn´t work. ChatGPT said to change this for:


from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.callbacks import CheckpointCallback

env = make_vec_env('env', n_envs=4, seed=0)
env = DummyVecEnv([lambda: env])

model = PPO('MlpLstmPolicy', env, verbose=1)

But I get this error: Error: Attempted to look up malformed environment ID: b'env'. (Currently all IDs must be of the form ^(?:[\w:-]+/)?([\w:.-]+)-v(\d+)$.)

I see that in the first option "model =" used env. So this is what I did.

I´ve changed "env" with everything else I found in the code but nothing worked.

Any help would be appreciated.

Unagi71
  • 1
  • 3

1 Answers1

0

Recurrent policies are not supported directly in sb3 yet, but you can use RecurrentPPO from sb3-contrib, I think that's what you want.

gehirndienst
  • 424
  • 2
  • 13