I would like to train a gym model based on a custom environment. The training loop looks like this:
obs = env.reset()
for i in range(1000):
action, _states = model.predict(obs, deterministic=True)
print(f"action: {action}")
obs, reward, done, info = env.step(action)
env.render()
if done:
obs = env.reset()
There are basic examples like this, e.g. here: https://stable-baselines.readthedocs.io/en/master/guide/examples.html
Somewhere else (within the environment class) I defined an action_space:
self.action_space = spaces.Discrete(5)
With this basic definition of the action_space the actions returned by model.predict for each step seem to be just numbers from 0 to 4.
Now - for making the question a little more practical - I assume, my environment describes a maze. My overall available actions in this case could be
realActions = [_UP, _DOWN, _LEFT, _RIGHT]
Now in a maze the available actions for each step are constantly changing. For example at the upper wall of the maze the actions would only be:
realActions = [_DOWN, _LEFT, _RIGHT]
So I would try to take this into consideration:
env.render()
realActions = env.getCurrentAvailableActions()
#set gym action_space with reduced no. of options:
self.action_space = spaces.Discrete(len(realActions))
And in env.step I would execute realActions[action] in the maze to do the correct move.
Unfortunately the reassignment of self.action_space seems not to be recognized by my model.
There is another important point: the workaround to assign realActions instead of defining action_space itsself with this values could never train correctly, because the model never would know, which effect the action it generates would have to the maze, because it does not see the assignment from its own action to realActions.
So my question is: does stable baselines / gym provide a practicable way to limit the action_spaces to dynamically (per step) available actions?
Thank you!