I'm working on several variants of DQN algoritms and I want to compare their learning efficiecy. I've seen couple graphs showing average Q-value per episode in some github repositories. I'm confused because neural network gives me Q-value for each action every step in the game. How do I compute values plotted in "average Q-value per episode" graphs?
Asked
Active
Viewed 375 times
1 Answers
1
One way to do this would be to keep track of the Q-Value per action and the number of steps taken in an episode. To get the average Q-Value per episode you simply sum up the per step Q-Values and divide by the number of steps in the episode. Or more formally:
Where N
is the total number of steps and Q_i
is the Q-Value at step i
.
That being said, it's a little odd to me that you would keep track of the Q-Value since usually each state/action pair has some Q-Value associated with it so what I've suggested here is not too helpful. Maybe you mean average "reward" instead?

Eugen Hotaj
- 383
- 2
- 10