0

I am struggling to understand one aspect of the Markov Decison Process.

When I am in state s and do action a, is it deterministic or stochastic to arrive in state s+1?

In most examples it seems to be deterministic. However I found one example in the picture below (David Silvers lecture on RL) where the transistion is stochastic. Namely following action "Pub".

graph

siva
  • 1,183
  • 3
  • 12
  • 28

1 Answers1

1

In general, in Markov Decission Processes the transition between states can be stochastic. Usually the probability trasition to another state is denoted with P_a(s, s'), where s is the current state, s' the next state, and a the action performed.

The deterministic case is a particular case of the stochastic one. If P_a(s, s') is equal to 1 for a given s' and 0 for the remaining states, we have a deterministic transition.

Pablo EM
  • 6,190
  • 3
  • 29
  • 37
  • Could you provide a conrete example of a P_a(s,s') matrix. Is it 3D? It can't just show links between states as the action taken needs to be taken into consideration as well. – siva Nov 17 '17 at 11:33
  • Notice that P_a(s,s') is a particular entry of the matrix, in your example could de P_{Pub}(s_0, s_1) = 0.4, P_{Pub}(s_0, s_2) = 0.4, P_{Pub}(s_0, s_2) = 0.2. Assuming s_0 is the stated where Pub action can be taken (it would be useful to name the states in your diagram ;) ). Yes, P matrix depends on 3 variables, namely, a, s, and s'. So it's a 3D matrix. – Pablo EM Nov 17 '17 at 11:46
  • One follow up question: Does the reward only depend on the action I take or also on the state I end up in? – siva Nov 17 '17 at 17:21
  • Typically depends on the achieved state, but I think it could make sense in some cases to depend also (but not only) in the action and/or the current state. I should check it in some reference to be sure... – Pablo EM Nov 17 '17 at 22:29