Are these two different formulas for Value-Iteration update equivalent?

Question

While studying MDP via different sources, I came across two different formulas for the Value update in Value-Iteration algorithm.

The first one is (the one on Wikipedia and a couple of books):

.
And the second one is (in some questions here on stack, and my course slides) :

For a specific iteration, they don't seem to give the same answer. Is one of them converging faster to the solution ?

score 1 · Accepted Answer · answered Mar 11 '20 at 17:34

Actually the difference is in reward functions R(s , s') or R(s) in the second formula.

First equation is generalized.

In the first one, the rewards is R_a(s , s') when transitioning from state s to the state s' due action a'. Reward could be different for different states and actions.

But if for every state s we have some pre-defined reward(regardless of the previous state and the action that leads to s), then we can simplify the formula to the second one.

The final values are not necessarily equal but the policies are same.

Are these two different formulas for Value-Iteration update equivalent?

1 Answers1