Q-table representation

Question

As far as I understand Q-learning, a Q-value is a measure of "how good" a particular state-action pair is. This is usually represented in a table in one of the following ways (see fig.):

Are both representations valid?
How do you determine the best action if the Q-table is given as a state to state transition table (as shown in the top q-table in the figure), especially if the state transitions are not deterministic (i.e. taking an action from a state can land you in different states at different times?)

@Pablo EM - thanks for the edit. Really appreciate it. – ajikodajis Mar 02 '17 at 23:21 — ajikodajis, Mar 02 '17 at 23:21

Don Reba · Accepted Answer · 2017-03-02T07:00:59.263

2

No. In general, an action is not equivalent to a transition to a particular state. There can be a different number of actions than states, the same action could lead to different states depending on which state it is performed in, and different actions could lead to the same state. Transitions can also be stochastic.
See (1).

edited Mar 02 '17 at 07:00

answered Mar 02 '17 at 06:11

Don Reba

13,814
3
48
61

from your answer I would conclude that the top q-table is NOT a correct representation whereas the bottom one is the correct representation of the q-table. Am I correct in doing so? – ajikodajis Mar 02 '17 at 11:52
@5mali, indeed. – Don Reba Mar 02 '17 at 16:45
@ Don Reba Thanks ^_^. It all makes sense now. – ajikodajis Mar 02 '17 at 23:18

Q-table representation

1 Answers1