I want to integrate my environment into the openAI gym and then use stable baselines library for training it.
link to stable baseline:
https://stable-baselines.readthedocs.io/
The learning method in the stable baseline is with one-line learning and you don't have access to the actions that are taken during the training.
model.learn(total_timesteps=10000)
More specifically you don't do the line where you sample from the environment in your code:
action = space.sample()
However, I like to add some logic to the place where I choose the action of the next state and reject some actions that the logic doesn't apply to them (like chess board illegal moves), something like:
for _ in range(1000):
action = env.space.sample()
if some_logic(action):
continue
one way to do it is to write a wrapper for the action_space sample() function and only choose legal actions. Like the one here class DiscreteWrapper(spaces.Discrete)
in the following link:
https://github.com/rockingdingo/gym-gomoku/blob/master/gym_gomoku/envs/gomoku.py
But the problem is that stable baseline only accepts certain data types and doesn't allow that either.
How can I do it in a way that is integrable into the stable baseline framework and doesn't violate the stable baselines criteria? If that is not possible at all, does anyone know a reinforcement learning framework other that unlike stable baseline allows access to the actions?