9

Using Python3.6, Ubuntu 18.04, Gym 0.15.4, RoS melodic, Tensorflow 1.14 and rl_coach 1.01:

I have built a custom Gym environment that is using a 360 element array as the observation_space.

high = np.array([4.5] * 360) #360 degree scan to a max of 4.5 meters
low = np.array([0.0] * 360)
self.observation_space = spaces.Box(low, high, dtype=np.float32)

However, this is not enough state to properly train via the ClippedPPO algo and I want to add additional features to my state that include:

Position in the world (x,y coords)
Orientation in the world (Quaternion: x,y,z,w) Linear Trajectory (x,y,z coords) angular trajectory (x,y,z coords).

I put the four features above into their own np.arrays and tried to pass them all back as the state object, but obviously it does not match the observation space. the space.Box confuses me. I am assuming I cannot dump all these features into a single np array since uppper and lower bounds will differ, however, I can't determine how to create a spaces.Box object with multiple "features".

TIA

learningtofly
  • 353
  • 1
  • 2
  • 13

2 Answers2

9

gym.spaces.Dict is what you need:

import gym

spaces = {
  'position': gym.spaces.Box(low=0, high=100, shape=(2,),
  'orientation': ...
}
dict_space = gym.spaces.Dict(spaces)
stefanbschneider
  • 5,460
  • 8
  • 50
  • 88
  • 2
    Do you know a RL algorithm, which supports `Dict` observation spaces? I could not find one and the docs for StableBaselines3 say, `Tuple` and `Dict` are not supported. – Philipp Apr 26 '21 at 13:34
  • 2
    @Philipp I have been using PPO with `Dict` observations from the [Ray RLlib](https://docs.ray.io/en/master/rllib.html) framework. As I understand it, all/most of the algorithms in the framework (there are many!) support `Dict` observations. – stefanbschneider Apr 27 '21 at 08:20
  • @CGFoX So when you train the agent does the Ray RLlib directly concatenates all the Dict observation or is there a separate NN for each value() of the Dict? – Satya Prakash Dash Jun 14 '21 at 03:36
  • 1
    @SatyaPrakashDash I'm not 100% sure, but I believe that RLlib simply concatenates the values to a single vector and passes the vector to a single NN. I.e., not separate NNs for each entry in the dict. Typically, that's what you'd want since you need one NN output (value, action, etc.) based on all observations, not multiple outputs based simply on parts of the observations. – stefanbschneider Jun 15 '21 at 12:57
  • @CGFoX yeah I think the same.. Thanks. – Satya Prakash Dash Jun 16 '21 at 13:08
1

Please look at the gym.spaces.Tuple class ref

P.S. You can look at how I have used it for my own ROS env here

pzolaiyc
  • 11
  • 3
  • 1
    I looked at your source code and how you return a `gym.spaces.Tuple` object as the observation. However at least RL algortihms from StableBaselines3 do not support `Tuple` observation or action spaces. Can you tell me which algorithm you used? – Philipp Apr 26 '21 at 13:33