Stable Baselines is a library with implementations of various reinforcement learning algorithms in Python, developed by OpenAI. Please mention the exact version of Stable Baselines that is being used in the body of the question.
Questions tagged [stable-baselines]
277 questions
3
votes
0 answers
Using SubprocVecEnv in SB3 results in "cannot pickle 'weakref' object" error, even though the same vectorized env works for DummyVecEnv. Why?
For multiprocessing I want to use the suggested vec_env_cls, SubprocVecEnv. Using it like the env-line in the given code below or also as vec_enc_cls=SubprocVecEnv in make_vec_env result both in an error of an unpicklable object.
def…

Stifterson
- 31
- 1
3
votes
1 answer
Can't install stable-baselines3[extra]
I am having trouble installing stable-baselines3[extra]. Not sure if I missed installing any dependency to make this work.
Machine: Mac M1,
Python: Python 3.10.9,
pip3: pip 23.0
!pip3 install 'stable-baselines3[extra]'
I used the above command to…

Codovert
- 43
- 1
- 5
3
votes
0 answers
Stablebaselines3 and Pettingzoo
I am trying to understand how to train agents in a pettingzoo environment using the single agent algorithm PPO implemented in stablebaselines3.
I'm following this tutorial where the agents act in a cooperative environment and they are all trained…

Onil90
- 171
- 1
- 8
3
votes
2 answers
Dict Observation Space for Stable Baselines3 Not Working
I've created a minimal reproducible example below, this can be run in a new Google Colab notebook for ease. Once the first install finishes, just Runtime > Restart and Run All for it to take effect.
I've made a simple roulette game environment below…

wildcat89
- 1,159
- 16
- 47
3
votes
1 answer
Reinforcement learning deterministic policies worse than non deterministic policies
We have a custom reinforcement learning environment within which we run a PPO agent from stable baselines3 for a multi action selection problem. The agent learns as expected but when we evaluate the learned policy from trained agents the agents…

GHE
- 75
- 4
3
votes
0 answers
StableBaselines3 - Why does calling "model.learn(50,000)" twice not give same result as calling "model.learn(100,000)" once?
I am working on a Reinforcement Learning problem in StableBaselines3.
I am trying to understand why this code:
model = MaskablePPO(MaskableActorCriticPolicy, env, verbose=1, learning_rate=0.0003, gamma=0.975, seed=10, batch_size=256,…

Vladimir Belik
- 280
- 1
- 12
3
votes
1 answer
Stable Baselines3 PPO() - how to change clip_range parameter during training?
I want to gradually decrease the clip_range (epsilon, exploration vs. exploitation parameter) throughout training in my PPO model.
I have tried to simply run "model.clip_range = new_value", but this doesn't work.
In the docs here , it says…

Vladimir Belik
- 280
- 1
- 12
3
votes
2 answers
RL + optimization: how to do it better?
I am learning about how to optimize using reinforcement learning. I have chosen the problem of maximum matching in a bipartite graph as I can easily compute the true optimum.
Recall that a matching in a graph is a subset of the edges where no two…

byteful
- 309
- 1
- 8
3
votes
0 answers
Learning rate scheduler in DQN within stable_baselines3
I'm experimenting with Reinforcement Learning using gym and stable-baselines3, particularly using the DQN implementation of stable-baselines3 for the MountainCar (https://gym.openai.com/envs/MountainCar-v0/).
I'm trying to implement a learning rate…
3
votes
0 answers
problem with adding logic for invalid moves in openai gym and stable-baselines
I want to integrate my environment into the openAI gym and then use stable baselines library for training it.
link to stable baseline:
https://stable-baselines.readthedocs.io/
The learning method in the stable baseline is with one-line learning and…

Saeid Ghafouri
- 163
- 6
2
votes
2 answers
Stable Baselines 3 support for Farama Gymnasium
I am building an environment in the maintained fork of gym: Gymnasium by Farama. In my gym environment, I state that the action_space = gym.spaces.Discrete(5) and the observation_space = gym.spaces.MultiBinary(25). Running the environment with the…

Lexpj
- 921
- 2
- 6
- 15
2
votes
0 answers
Why using GPU in Stable Baselines 3 is slower than using cpu?
When training the "CartPole" environment with Stable Baselines 3 using PPO, I get that training the model using cuda GPU is almost twice as slow as training the model with just the cpu (both in google colab and in local).
I thought using cuda for…

Joel
- 21
- 2
2
votes
1 answer
Why `ep_rew_mean` much larger than the reward evaluated by the `evaluate_policy()` fuction
I write a custom gym environment, and trained with PPO provided by stable-baselines3. The ep_rew_mean recorded by tensorboard is as follow:
the ep_rew_mean curve for total 100 million steps, each episode has 50 steps
As shown in the figure, the…

Aramiis
- 21
- 2
2
votes
1 answer
No GL context; create a Window first
When I ran the Stable Baselines3 RL Colab Notebooks, an error occurred.
stable_baselines_getting_started.ipynb
record_video('CartPole-v1', model, video_length=500, prefix='ppo-cartpole')
GLException Traceback (most…

Kei TSUKAMOTO
- 21
- 3
2
votes
1 answer
AssertionError: The algorithm only supports as action spaces but Box(-1.0, 1.0, (3,), float32) was provided
So basically I tried converting this custom gym environment from https://github.com/Gor-Ren/gym-jsbsim to use farama foundation's gymnasium api. This is my repo whih I am working on: https://github.com/sryu1/jsbgym
When I try training the…

sryu1
- 45
- 1
- 6