Learning Curve in Q-learning

Question

My question is I wrote the Q-learning algorithm in c++ with epsilon greedy policy now I have to plot the learning curve for the Q-values. What exactly I should have to plot because I have an 11x5 Q matrix, so should I take one Q value and plot its learning or should I have to take the whole matrix for a learning curve, could you guide me with it. Thank you

score 0 · Accepted Answer · answered Feb 06 '22 at 15:44

0

Learning curves in RL are typically plots of returns over time, not Q-losses or anything like this. So you should run your environment, compute the total reward (aka return) and plot it at a corresponding time.

answered Feb 06 '22 at 15:44

lejlot

64,777
8
131
164

Thank you for your reply. So that means I have to add all the values of reward coming from the environment at different actions taken and plot it? But isn't Q learning all ready doing it using a Bellman equation? – Nifty Feb 07 '22 at 15:12
Yes that's what you need to do. Bellman equation looks at Q values, not actual rewards. – lejlot Feb 07 '22 at 20:11

Learning Curve in Q-learning

1 Answers1