0

As far as I understand Q-learning, a Q-value is a measure of "how good" a particular state-action pair is. This is usually represented in a table in one of the following ways (see fig.):

enter image description here

  1. Are both representations valid?
  2. How do you determine the best action if the Q-table is given as a state to state transition table (as shown in the top q-table in the figure), especially if the state transitions are not deterministic (i.e. taking an action from a state can land you in different states at different times?)
Pablo EM
  • 6,190
  • 3
  • 29
  • 37
ajikodajis
  • 15
  • 4

1 Answers1

2
  1. No. In general, an action is not equivalent to a transition to a particular state. There can be a different number of actions than states, the same action could lead to different states depending on which state it is performed in, and different actions could lead to the same state. Transitions can also be stochastic.

  2. See (1).

Don Reba
  • 13,814
  • 3
  • 48
  • 61