0

In Bellman equation Bellman equation

where,

s = a particular state (room)

a = action (moving between the rooms)

s′ = state to which the robot goes from s

= discount factor

R(s, a) = a reward function which takes a state s and action a and outputs a reward value V(s) = value of being in a particular state (the footprint)

My question is what is a bellow max and how to use it in programming?

TinyCoder
  • 33
  • 10

1 Answers1

2

The a below max means it takes the maximum reward value among all the actions can be taken at s and the value of the next state s'.

Bellman equation should be used for choosing the next action for your model based on the known reward functions.

Please currect me if there is anything wrong. Thanks.

Jiao Dian
  • 31
  • 5