4

I am trying to make a custom gym environment with five actions, all of which can have continuous values. To implement the same, I have used the following action_space format:

self.action_space = spaces.Tuple((spaces.Box(low=np.array([0]),high=np.array([1]), dtype=np.float32),
                           spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           spaces.Box(low=np.array([-2]), high=np.array([2]),dtype=np.float32),
                           spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           spaces.Box(low=np.array([1]), high=np.array([20]),dtype=np.int8)))

However, when I try to run a PPO model(from stable_baselines3), I get the following error:

AssertionError: The algorithm only supports (<class 'gym.spaces.box.Box'>, <class 'gym.spaces.discrete.Discrete'>, <class 'gym.spaces.multi_discrete.MultiDiscrete'>, <class 'gym.spaces.multi_binary.MultiBinary'>) as action spaces but Tuple(Box(0.0, 1.0, (1,), float32), Box(0.0, 1.0, (1,), float32), Box(-2.0, 2.0, (1,), float32), Box(0.0, 1.0, (1,), float32), Box(1, 20, (1,), int8)) was provided

I searched for a bit about this issue and I found this on Github:

Link According to this I changed my code in the following way:

self.action_space = {"Temperature": spaces.Box(low=np.array([0]),high=np.array([1]), dtype=np.float32),
                           "topP": spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           "frequencyPenalty": spaces.Box(low=np.array([-2]), high=np.array([2]),dtype=np.float32),
                           "presencePenalty": spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           "bestOf": spaces.Box(low=np.array([1]), high=np.array([20]),dtype=np.int8)}

But this still returned the same error.

Also, I found this answer: Link

According to this, my code should work as I am using the Tuple space too.

How do I convert this to an accepted data type for the action_space?

Ravish Jha
  • 481
  • 3
  • 25

2 Answers2

1

Unfortunately most of the stable-baselines3 implementation only support Box, Discrete, MultiDiscrete and MultiBinary action spaces (see stable-baselines3 Implemented Algorithms).

The link you posted referred to openai, and not stable-baselines3.

You should look into other frameworks and check if their algorithm implementations support Tuples / Dictionaries, or otherwise try to implement your own!

Otherwise you could try to check if your action spaces with multiple Box-type actions can be easily converted into Discrete-type actions! (which is supported in stable-baselines3 through MultiDiscrete)

1

"All of which can have continuous values"

Your link is about mixed between integer and continues. To simply make all continues, you can use Box alone.

self.action_space = spaces.Box(low=np.array([0,0,-2,0,1]),
                               high=np.array([1,1,2,1,20]),
                               dtype=np.float32)

Muhammad Yasirroni
  • 1,512
  • 12
  • 22