0

I have a simple game on a grid. 25 states, five actions per state (left, right, up, down, stay). There might be special rules for edges and corners, but these won't matter here.

My reward matrix (below) is pretty sparse, but this is all the data I have or ever will have. I have to make inferences about missing reward data.

Q-learning is itself a kind of interpolation scheme, but interpreting my reward matrix at face value will mean moves toward D4 will always have more value than moves toward B2. (Presumably B2 > D4 in a full Reward matrix).

I can think of a dozen ways to blend the reward matrix, but for purely aesthetic reasons I'm hoping there's some canonical (and iterative) interpolation scheme for Q-learning rewards. Something that will blend somewhat seamlessly into the basic Q-learning algorithm.

reward matrix

Thank you.

Shay
  • 1,368
  • 11
  • 17

0 Answers0