2

Since i am a beginning in this field,I am having a doubt about the effect in between how does the different epsilon value will affect the SARSA and Qlearning with the epsilon greedy algorithm for action selection.

I understand that when epsilon is equal to 0, actions are always choosed based on a policy derived from Q. Therefore, Q-learning first updates Q, and it selects the next action based on the updated Q. On the other hand, SARSA chooses the next action and after updates Q.

How about when ε is equal to 1? and ε is increase from 0 to 1?

Thank you!

1 Answers1

2

The ε-greedy policy selects a random action with probability ε or the best known action with probability 1-ε. At ε=1 it will always pick the random action. This value makes the trade-off between exploration and exploitation: you want to use the knowledge you have, but you also want to search for better alternatives.

Don Reba
  • 13,814
  • 3
  • 48
  • 61
  • sum up,so i can said that when ε=0, it will always pick the exploitation which is only base on the knowledge we have, but when ε=1 it randomly taking action in between explore and exploit to find out any better way to run it. – user3064688 Nov 17 '15 at 10:11
  • At ε=1 it will explore only any disregard any knowledge it gains. – Don Reba Nov 17 '15 at 20:06