3

I want to integrate my environment into the openAI gym and then use stable baselines library for training it.

link to stable baseline:

https://stable-baselines.readthedocs.io/

The learning method in the stable baseline is with one-line learning and you don't have access to the actions that are taken during the training.

model.learn(total_timesteps=10000)

More specifically you don't do the line where you sample from the environment in your code:

action = space.sample()

However, I like to add some logic to the place where I choose the action of the next state and reject some actions that the logic doesn't apply to them (like chess board illegal moves), something like:

for _ in range(1000):
    action = env.space.sample()
    if some_logic(action):
         continue

one way to do it is to write a wrapper for the action_space sample() function and only choose legal actions. Like the one here class DiscreteWrapper(spaces.Discrete) in the following link:

https://github.com/rockingdingo/gym-gomoku/blob/master/gym_gomoku/envs/gomoku.py

But the problem is that stable baseline only accepts certain data types and doesn't allow that either.

How can I do it in a way that is integrable into the stable baseline framework and doesn't violate the stable baselines criteria? If that is not possible at all, does anyone know a reinforcement learning framework other that unlike stable baseline allows access to the actions?

0 Answers0