Bellman equation

Question

In Bellman equation

where,

s = a particular state (room)

a = action (moving between the rooms)

s′ = state to which the robot goes from s

= discount factor

R(s, a) = a reward function which takes a state s and action a and outputs a reward value V(s) = value of being in a particular state (the footprint)

My question is what is a bellow max and how to use it in programming?

This isn't a programming question and is better suited for [stats.SE]. — Arya McCarthy, Dec 08 '20 at 12:45
No, I mean that your question is about how to read a mathematical formula—which isn't about programming or debugging. — Arya McCarthy, Dec 10 '20 at 16:44

score 2 · Answer 1 · answered Dec 08 '20 at 16:40

The a below max means it takes the maximum reward value among all the actions can be taken at s and the value of the next state s'.

Bellman equation should be used for choosing the next action for your model based on the known reward functions.

Please currect me if there is anything wrong. Thanks.

1 Answers1