Questions tagged [stable-baselines]

Stable Baselines is a library with implementations of various reinforcement learning algorithms in Python, developed by OpenAI. Please mention the exact version of Stable Baselines that is being used in the body of the question.

277 questions
3
votes
0 answers

Using SubprocVecEnv in SB3 results in "cannot pickle 'weakref' object" error, even though the same vectorized env works for DummyVecEnv. Why?

For multiprocessing I want to use the suggested vec_env_cls, SubprocVecEnv. Using it like the env-line in the given code below or also as vec_enc_cls=SubprocVecEnv in make_vec_env result both in an error of an unpicklable object. def…
Stifterson
  • 31
  • 1
3
votes
1 answer

Can't install stable-baselines3[extra]

I am having trouble installing stable-baselines3[extra]. Not sure if I missed installing any dependency to make this work. Machine: Mac M1, Python: Python 3.10.9, pip3: pip 23.0 !pip3 install 'stable-baselines3[extra]' I used the above command to…
Codovert
  • 43
  • 1
  • 5
3
votes
0 answers

Stablebaselines3 and Pettingzoo

I am trying to understand how to train agents in a pettingzoo environment using the single agent algorithm PPO implemented in stablebaselines3. I'm following this tutorial where the agents act in a cooperative environment and they are all trained…
3
votes
2 answers

Dict Observation Space for Stable Baselines3 Not Working

I've created a minimal reproducible example below, this can be run in a new Google Colab notebook for ease. Once the first install finishes, just Runtime > Restart and Run All for it to take effect. I've made a simple roulette game environment below…
wildcat89
  • 1,159
  • 16
  • 47
3
votes
1 answer

Reinforcement learning deterministic policies worse than non deterministic policies

We have a custom reinforcement learning environment within which we run a PPO agent from stable baselines3 for a multi action selection problem. The agent learns as expected but when we evaluate the learned policy from trained agents the agents…
3
votes
0 answers

StableBaselines3 - Why does calling "model.learn(50,000)" twice not give same result as calling "model.learn(100,000)" once?

I am working on a Reinforcement Learning problem in StableBaselines3. I am trying to understand why this code: model = MaskablePPO(MaskableActorCriticPolicy, env, verbose=1, learning_rate=0.0003, gamma=0.975, seed=10, batch_size=256,…
Vladimir Belik
  • 280
  • 1
  • 12
3
votes
1 answer

Stable Baselines3 PPO() - how to change clip_range parameter during training?

I want to gradually decrease the clip_range (epsilon, exploration vs. exploitation parameter) throughout training in my PPO model. I have tried to simply run "model.clip_range = new_value", but this doesn't work. In the docs here , it says…
Vladimir Belik
  • 280
  • 1
  • 12
3
votes
2 answers

RL + optimization: how to do it better?

I am learning about how to optimize using reinforcement learning. I have chosen the problem of maximum matching in a bipartite graph as I can easily compute the true optimum. Recall that a matching in a graph is a subset of the edges where no two…
3
votes
0 answers

Learning rate scheduler in DQN within stable_baselines3

I'm experimenting with Reinforcement Learning using gym and stable-baselines3, particularly using the DQN implementation of stable-baselines3 for the MountainCar (https://gym.openai.com/envs/MountainCar-v0/). I'm trying to implement a learning rate…
3
votes
0 answers

problem with adding logic for invalid moves in openai gym and stable-baselines

I want to integrate my environment into the openAI gym and then use stable baselines library for training it. link to stable baseline: https://stable-baselines.readthedocs.io/ The learning method in the stable baseline is with one-line learning and…
2
votes
2 answers

Stable Baselines 3 support for Farama Gymnasium

I am building an environment in the maintained fork of gym: Gymnasium by Farama. In my gym environment, I state that the action_space = gym.spaces.Discrete(5) and the observation_space = gym.spaces.MultiBinary(25). Running the environment with the…
Lexpj
  • 921
  • 2
  • 6
  • 15
2
votes
0 answers

Why using GPU in Stable Baselines 3 is slower than using cpu?

When training the "CartPole" environment with Stable Baselines 3 using PPO, I get that training the model using cuda GPU is almost twice as slow as training the model with just the cpu (both in google colab and in local). I thought using cuda for…
Joel
  • 21
  • 2
2
votes
1 answer

Why `ep_rew_mean` much larger than the reward evaluated by the `evaluate_policy()` fuction

I write a custom gym environment, and trained with PPO provided by stable-baselines3. The ep_rew_mean recorded by tensorboard is as follow: the ep_rew_mean curve for total 100 million steps, each episode has 50 steps As shown in the figure, the…
2
votes
1 answer

No GL context; create a Window first

When I ran the Stable Baselines3 RL Colab Notebooks, an error occurred. stable_baselines_getting_started.ipynb record_video('CartPole-v1', model, video_length=500, prefix='ppo-cartpole') GLException Traceback (most…
2
votes
1 answer

AssertionError: The algorithm only supports as action spaces but Box(-1.0, 1.0, (3,), float32) was provided

So basically I tried converting this custom gym environment from https://github.com/Gor-Ren/gym-jsbsim to use farama foundation's gymnasium api. This is my repo whih I am working on: https://github.com/sryu1/jsbgym When I try training the…
1
2
3
18 19