1

Question in title.

I've been searching around and can't seem to find much of an explanation about the two.

Say you have a model that uses 20% of the rank calls for exploration. I suspect matched rewards are how many times out of the 80% it was rewarded.

Can anyone confirm this?

By recording locally, I can confirm that learned events and observed rewards match up but struggling to explain why matched events are so low. Graph for example:

graph

Ross Perry
  • 23
  • 4

0 Answers0