Learning rate scheduler in DQN within stable_baselines3

Question

I'm experimenting with Reinforcement Learning using gym and stable-baselines3, particularly using the DQN implementation of stable-baselines3 for the MountainCar (https://gym.openai.com/envs/MountainCar-v0/).

I'm trying to implement a learning rate scheduler that reduces the learning rate whenever the reward value of a reinforcement learning model has been above a certain threshold for a given number of iterations. I've tried the following:

Pass a function instead of a number to learning_rate when defining the model, as learning_rate can be callable. However, it seems to only run it in the first iteration and doesn't update the learning rate later on.
Pass a function as lr_scheduler in the policy_kwargs:

    env = gym.make('MountainCar-v0')
    #You can also load other environments like cartpole, MountainCar, Acrobot. Refer to https://gym.openai.com/docs/ for descriptions.
    #For example, if you would like to load Cartpole, just replace the above statement with "env = gym.make('CartPole-v1')".
    
    env = stable_baselines3.common.monitor.Monitor(env, log_dir )
    
    callback = EvalCallback(env,log_path = log_dir, deterministic=True) #For evaluating the performance of the agent periodically and logging the results.
    policy_kwargs = dict(activation_fn=torch.nn.ReLU,
                         net_arch=nn_layers, lr_schedule = lr_schedule_custom)
    
    model = DQN("MlpPolicy", env, policy_kwargs = policy_kwargs)

However, I get the error __init__() got multiple values for argument 'lr_schedule', despite of the fact that the documentation (https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html) doesn't make any difference between the lr_schedule argument for the policy and the other arguments I'm using in the policy_kwards. How should I do this?

Thanks a lot!

Can you provide the exact stack trace? – Ivan Aug 10 '21 at 19:04 — Ivan, Aug 10 '21 at 19:04

Learning rate scheduler in DQN within stable_baselines3

0 Answers0