Question in title.
I've been searching around and can't seem to find much of an explanation about the two.
Say you have a model that uses 20% of the rank calls for exploration. I suspect matched rewards are how many times out of the 80% it was rewarded.
Can anyone confirm this?
By recording locally, I can confirm that learned events and observed rewards match up but struggling to explain why matched events are so low. Graph for example: