I am trying to learn about Markov decision problems and I was given the algorithm for Value Iteration, but I am confused how to turn them into actual C++ code. Mainly the parts where summations and such occur. Here is the algorithm:
function VALUE-ITERATION(P;R) returns a utility matrix
inputs: P, a transition-probability matrix
R, a reward matrix
local variables: U, utility matrix, initially identical to R
U', utility matrix, initially identical toR
repeat
U <- U'
for each state i do
U'(s_i) <- R(s_i) + max_a Summation_j P^a_ij*U(s_j)
end
until max_(s_i) |U(s_i) - U'(s_i)| < e
return U
This looks like hieroglyphics to me, is there a simpler algorithm that would be of more help to me? Or could somebody dumb it down for me?