I am trying to make a custom gym environment with five actions, all of which can have continuous values. To implement the same, I have used the following action_space
format:
self.action_space = spaces.Tuple((spaces.Box(low=np.array([0]),high=np.array([1]), dtype=np.float32),
spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
spaces.Box(low=np.array([-2]), high=np.array([2]),dtype=np.float32),
spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
spaces.Box(low=np.array([1]), high=np.array([20]),dtype=np.int8)))
However, when I try to run a PPO model(from stable_baselines3
), I get the following error:
AssertionError: The algorithm only supports (<class 'gym.spaces.box.Box'>, <class 'gym.spaces.discrete.Discrete'>, <class 'gym.spaces.multi_discrete.MultiDiscrete'>, <class 'gym.spaces.multi_binary.MultiBinary'>) as action spaces but Tuple(Box(0.0, 1.0, (1,), float32), Box(0.0, 1.0, (1,), float32), Box(-2.0, 2.0, (1,), float32), Box(0.0, 1.0, (1,), float32), Box(1, 20, (1,), int8)) was provided
I searched for a bit about this issue and I found this on Github:
Link According to this I changed my code in the following way:
self.action_space = {"Temperature": spaces.Box(low=np.array([0]),high=np.array([1]), dtype=np.float32),
"topP": spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
"frequencyPenalty": spaces.Box(low=np.array([-2]), high=np.array([2]),dtype=np.float32),
"presencePenalty": spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
"bestOf": spaces.Box(low=np.array([1]), high=np.array([20]),dtype=np.int8)}
But this still returned the same error.
Also, I found this answer: Link
According to this, my code should work as I am using the Tuple space too.
How do I convert this to an accepted data type for the action_space?