Flow Chart
I'm having trouble understanding the 4th and 5th step in the flowchart.
Am I right to say that the Q value of a particular state and action is the same as the state-action pair value of that same state and action?
For the 4th step, does 'calculate return for a state-action pair' mean the same as finding the state-action pair value of that particular state?
For the 5th step, the 'update the Q function by taking the average of returns' is confusing. From what I understand, the Q function is basically the state-action pair values put in a table (the Q table).To update it means to make adjustments to the state-action pair value of the individual states and their respective actions(e.g state 1 action 1,state 3 action 1, state 3 action 2, so on and so forth...). I'm not sure what 'average of returns' means though. Is it asking me to take the average of the returns after x episodes?(From my understanding, returns is the sum of rewards in 1 episode) ( So, AVG=sum of returns/x) And what do I do with that average? I'm a little confused when they say 'update the Q function' because the Q function consists of many parameters that must be updated(the individual state-action pair value), and im not sure which one they are refering to
Also, what is the point of calculating the avg of returns? Since the state-action pair value for a particular state and particular action will always be the same(e.g if i always take action 3 in state 4, i will always get value=2)
Thanks :)