How to estimate average Q-value per episode?

Question

I'm working on several variants of DQN algoritms and I want to compare their learning efficiecy. I've seen couple graphs showing average Q-value per episode in some github repositories. I'm confused because neural network gives me Q-value for each action every step in the game. How do I compute values plotted in "average Q-value per episode" graphs?

score 1 · Answer 1 · answered Mar 10 '18 at 23:06

One way to do this would be to keep track of the Q-Value per action and the number of steps taken in an episode. To get the average Q-Value per episode you simply sum up the per step Q-Values and divide by the number of steps in the episode. Or more formally:

Where N is the total number of steps and Q_i is the Q-Value at step i.

That being said, it's a little odd to me that you would keep track of the Q-Value since usually each state/action pair has some Q-Value associated with it so what I've suggested here is not too helpful. Maybe you mean average "reward" instead?

How to estimate average Q-value per episode?

1 Answers1