Convergence of value iteration

Question

Why the termination condition of value-iteration algorithm ( example http://aima-java.googlecode.com/svn/trunk/aima-core/src/main/java/aima/core/probability/mdp/search/ValueIteration.java )

In the MDP (Markov Desicion Process) is

||Ui+1-Ui||< error*(1-gamma)/gamma, where

Ui is vector of utilities
Ui+1 updated vector of utilities

error -error bound used in algorithm

gamma-discount factor used in algorithm

Where does "error*(1-gamma)/gamma" come from? "divided by gamma" is because every step is discounted by gamma? But error*(1-gamma)? And how big must be an error?

Can you explain a bit more, especially "What is MDP?" and "What are your parameters (the Ui, gamma, error, etc)?" — justhalf, Nov 11 '13 at 02:54

score 0 · Accepted Answer · answered Nov 11 '13 at 06:35

0

That's called a Bellman Error or a Bellman Residual.

See Williams and Baird, 1993 for use in MDPs.

See Littman, 1994 for use in POMDPs.

answered Nov 11 '13 at 06:35

Novak

4,687
2
26
64

I do not see any references to error*(1-gamma)/gamma termination condition there. – user34618 Nov 11 '13 at 12:00
Theorem three of Littman; taken from the performance bounds section of Williams and Baird. – Novak Nov 11 '13 at 15:13

Convergence of value iteration

1 Answers1