How can I interpolate missing Reward-matrix entries (Q-learning)?

Asked May 10 '20 at 17:14

Active May 10 '20 at 17:14

Viewed 45 times

I have a simple game on a grid. 25 states, five actions per state (left, right, up, down, stay). There might be special rules for edges and corners, but these won't matter here.

My reward matrix (below) is pretty sparse, but this is all the data I have or ever will have. I have to make inferences about missing reward data.

Q-learning is itself a kind of interpolation scheme, but interpreting my reward matrix at face value will mean moves toward D4 will always have more value than moves toward B2. (Presumably B2 > D4 in a full Reward matrix).

I can think of a dozen ways to blend the reward matrix, but for purely aesthetic reasons I'm hoping there's some canonical (and iterative) interpolation scheme for Q-learning rewards. Something that will blend somewhat seamlessly into the basic Q-learning algorithm.

Thank you.

asked May 10 '20 at 17:14

Shay

1,368
11
17

Can you please clarify the question? Are you asking what reward is given for squares not specified? – Mike May 10 '20 at 18:53
Yes, other than 0. Wondering what the typical assumptions are. – Shay May 10 '20 at 21:08

How can I interpolate missing Reward-matrix entries (Q-learning)?

0 Answers0