q table with gym (using box observation space)

Question

I'm trying to run a q-learning algorithm with this observation space:

self.observation_space = spaces.Box(low=np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), high=np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), dtype=np.flo

when im trying to access the q table, like this:

q_value = q_table[state][action]

I'm getting this error:

IndexError: arrays used as indices must be of integer (or boolean) type

So my question is: how am i supposed to access the q table when my observation space is definded using space.box?

If thats needed, this is how the q_table is defined (its a code i took from the internet, trying to adjust it to my project):

num_box = tuple((env.observation_space.high + np.ones(env.observation_space.shape)).astype(int))
q_table = np.zeros(num_box + (env.action_space.n,))

score 1 · Answer 1 · answered Jul 18 '21 at 19:19

You're not saying of what type q_table is. I will assume it's an numpy array defined as in OpenAI Gym and Python set up for Q-learning:

action_space_size = env.action_space.n
state_space_size = env.observation_space.n

q_table = np.zeros((state_space_size, action_space_size))

You're getting this error because you're not indexing the elements of the numpy array with an integer. Again, I haven't seen your code, but I believe you are trying to get a specific row of the Q table using a tuple.

Regardless, you should not use a Box observation space when using Q-learning, but rather a Discrete one. When using Q-learning, you need to know the number of states in advance, to initialize the Q-table. Box spaces are for real values, and the number of dimensions of the space does not define the number of states. For example, if you create a Box space like this:

spaces.Box(low=0, high=1, shape=(2, 2), dtype=np.float16)

you won't have 4 states, but potentially infinite states. The parameters low=0 and high=1 indicate the minimum and maximum value of the four variables in the Box space, but there can be may several values between 0 and 1 (0.1, 0.2, etc.). For this reason, you cannot estimate the number of states beforehand.

If you use np.uint8 (or any integer type) as dtype, you could potentially count the number of states, but it would still be a stretch to use Box spaces instead of Discrete spaces. Moreover, even using integer values the following will not work:

num_box = tuple((env.observation_space.high + np.ones(env.observation_space.shape)).astype(int))
q_table = np.zeros(num_box + (env.action_space.n,))

q table with gym (using box observation space)

1 Answers1