Q-table representation for nested lists as states and tuples as actions

Question

How can I create a Q-table, when my states are lists and actions are tuples?

Example of states for N = 3

[[1], [2], [3]]
[[1], [2, 3]]
[[1], [3, 2]]
[[2], [3, 1]]
[[1, 2, 3]]

Example of actions for those states

[[1], [2], [3]] -> (1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)
[[1], [2, 3]] -> (1, 2), (2, 0), (2, 1)
[[1], [3, 2]] -> (1, 3), (3, 0), (3, 1)
[[2], [3, 1]] -> (2, 3), (3, 0), (3, 2)
[[1, 2, 3]] -> (1, 0)

I was wondering about

# q_table = {state: {action: q_value}}

But I don't think, thats a good design.

score 0 · Answer 1 · answered Apr 05 '22 at 01:31

1. Should your states really be of type list?

list is a mutable type. tuple is the equivalent immutable type. Do you mutate your states during learning? I doubt it.

In any case if you use list, you cannot use it as a dictionary key (because it is mutable)

2. Otherwise this is a pretty good representation

In a reinforcement learning context, you’ll want to

get a specific value for Q
Look at the Q values for all possible actions in a specific state (to find the maximal Q)

Your representation allows you to do both of these with minimal complexity, and is pretty clear. So it is a good representation.

vstack17 · Answer 2 · 2022-04-05T21:19:35.530

Using a nested dictionary is actually a reasonable design choice for custom tabular reinforcement learning---it's called tabular for a reason :)

You could use defaultdict to initialize the q-table to a certain value, e.g., 0.

from collections import defaultdict

q = defaultdict(lambda: defaultdict(lambda: default_q_value))

or without defaultdict:

q = {s: {a: default_q_value for a in actions} for s in states}

It is then convenient to perform updates by getting the max by something like so

best_next_state_val = max(q[s].values())
q[state][action] += alpha * (reward + gamma * best_next_state_val)

One thing I'd just watch out for is that if you train an agent using a q-table like this, it will pick the same action each time if all the values for the actions are equal (such as when the qf is initialized).

Finally, if you don't want to use dictionaries, you can just map state and action tuples to indices, store the mapping in a dictionary, and use a lookup when you pass the state/action to your environment implementation. You can then just use them as indices of a 2d numpy array.

Q-table representation for nested lists as states and tuples as actions

2 Answers2

1. Should your states really be of type list?

2. Otherwise this is a pretty good representation