Following action a from state s, is the outcome probablisitc or deterministic?

Question

I am struggling to understand one aspect of the Markov Decison Process.

When I am in state s and do action a, is it deterministic or stochastic to arrive in state s+1?

In most examples it seems to be deterministic. However I found one example in the picture below (David Silvers lecture on RL) where the transistion is stochastic. Namely following action "Pub".

graph

score 1 · Accepted Answer · answered Nov 17 '17 at 08:00

1

In general, in Markov Decission Processes the transition between states can be stochastic. Usually the probability trasition to another state is denoted with P_a(s, s'), where s is the current state, s' the next state, and a the action performed.

The deterministic case is a particular case of the stochastic one. If P_a(s, s') is equal to 1 for a given s' and 0 for the remaining states, we have a deterministic transition.

answered Nov 17 '17 at 08:00

Pablo EM

6,190
3
29
37

Could you provide a conrete example of a P_a(s,s') matrix. Is it 3D? It can't just show links between states as the action taken needs to be taken into consideration as well. – siva Nov 17 '17 at 11:33
Notice that P_a(s,s') is a particular entry of the matrix, in your example could de P_{Pub}(s_0, s_1) = 0.4, P_{Pub}(s_0, s_2) = 0.4, P_{Pub}(s_0, s_2) = 0.2. Assuming s_0 is the stated where Pub action can be taken (it would be useful to name the states in your diagram ;) ). Yes, P matrix depends on 3 variables, namely, a, s, and s'. So it's a 3D matrix. – Pablo EM Nov 17 '17 at 11:46
One follow up question: Does the reward only depend on the action I take or also on the state I end up in? – siva Nov 17 '17 at 17:21
Typically depends on the achieved state, but I think it could make sense in some cases to depend also (but not only) in the action and/or the current state. I should check it in some reference to be sure... – Pablo EM Nov 17 '17 at 22:29

Following action a from state s, is the outcome probablisitc or deterministic?

1 Answers1