0

My question is I wrote the Q-learning algorithm in c++ with epsilon greedy policy now I have to plot the learning curve for the Q-values. What exactly I should have to plot because I have an 11x5 Q matrix, so should I take one Q value and plot its learning or should I have to take the whole matrix for a learning curve, could you guide me with it. Thank you

Nifty
  • 67
  • 8

1 Answers1

0

Learning curves in RL are typically plots of returns over time, not Q-losses or anything like this. So you should run your environment, compute the total reward (aka return) and plot it at a corresponding time.

lejlot
  • 64,777
  • 8
  • 131
  • 164
  • Thank you for your reply. So that means I have to add all the values of reward coming from the environment at different actions taken and plot it? But isn't Q learning all ready doing it using a Bellman equation? – Nifty Feb 07 '22 at 15:12
  • Yes that's what you need to do. Bellman equation looks at Q values, not actual rewards. – lejlot Feb 07 '22 at 20:11