where,
s = a particular state (room)
a = action (moving between the rooms)
s′ = state to which the robot goes from s
= discount factor
R(s, a) = a reward function which takes a state s and action a and outputs a reward value V(s) = value of being in a particular state (the footprint)
My question is what is a bellow max and how to use it in programming?