Why the termination condition of value-iteration algorithm ( example http://aima-java.googlecode.com/svn/trunk/aima-core/src/main/java/aima/core/probability/mdp/search/ValueIteration.java )
In the MDP (Markov Desicion Process) is
||Ui+1-Ui||< error*(1-gamma)/gamma, where
Ui is vector of utilities
Ui+1 updated vector of utilities
error -error bound used in algorithm
gamma-discount factor used in algorithm
Where does "error*(1-gamma)/gamma" come from? "divided by gamma" is because every step is discounted by gamma? But error*(1-gamma)? And how big must be an error?