Since i am a beginning in this field,I am having a doubt about the effect in between how does the different epsilon value will affect the SARSA and Qlearning with the epsilon greedy algorithm for action selection.
I understand that when epsilon is equal to 0, actions are always choosed based on a policy derived from Q. Therefore, Q-learning first updates Q, and it selects the next action based on the updated Q. On the other hand, SARSA chooses the next action and after updates Q.
How about when ε is equal to 1? and ε is increase from 0 to 1?
Thank you!