1

Flow Chart

Flow Chart

I'm having trouble understanding the 4th and 5th step in the flowchart.

Am I right to say that the Q value of a particular state and action is the same as the state-action pair value of that same state and action?

For the 4th step, does 'calculate return for a state-action pair' mean the same as finding the state-action pair value of that particular state?

For the 5th step, the 'update the Q function by taking the average of returns' is confusing. From what I understand, the Q function is basically the state-action pair values put in a table (the Q table).To update it means to make adjustments to the state-action pair value of the individual states and their respective actions(e.g state 1 action 1,state 3 action 1, state 3 action 2, so on and so forth...). I'm not sure what 'average of returns' means though. Is it asking me to take the average of the returns after x episodes?(From my understanding, returns is the sum of rewards in 1 episode) ( So, AVG=sum of returns/x) And what do I do with that average? I'm a little confused when they say 'update the Q function' because the Q function consists of many parameters that must be updated(the individual state-action pair value), and im not sure which one they are refering to

Also, what is the point of calculating the avg of returns? Since the state-action pair value for a particular state and particular action will always be the same(e.g if i always take action 3 in state 4, i will always get value=2)

Thanks :)

desertnaut
  • 57,590
  • 26
  • 140
  • 166
BG10
  • 11
  • 1
  • 1
    Hi. It's better you ask this question on [Artificial Intelligence Stack Exchange](https://ai.stackexchange.com/). Also, please, if you decide to do so, ask only one question per post. – nbro Apr 15 '20 at 11:57
  • Alright, thanks :) – BG10 Apr 15 '20 at 13:04

0 Answers0