Qlearning Epsilon-greedy exploration: Epsilon decay X fixed

Question

I am teaching an agent to get out of a maze collecting all apples on its way using Qlearning.

I read that is possible to leave a fixed epsilon or to choose an epsilon and decay it as time passes.

I couldn't find the advantages or disadvantages of each approach, I would love to hear more if you can help me understanding which should I use.

score 2 · Answer 1 · answered Nov 10 '19 at 18:58

2

I'm going to assume you're referring to epsilon as in "epsilon-green exploration". The goal of this parameter is to control how much your agent believe in his current policy. With a large epsilon value, your agent will tend to ignore his policy and choose random action. This exploration is often a good idea when your policy is rather weak, especially at the beginning of training. Sometimes, people decay epsilon as time passes in order to reflect that their policy gets better and better and they want to exploit rather than explore.

There is no right way to pick epsilon, or its decay rate, for every problem. The best way is probably to try out different values.

answered Nov 10 '19 at 18:58

francoisr

4,407
1
28
48

Hi! Thanks for the answer! Can you tell me a concrete example of when not decaying epsilon and leave it fixed is a good idea? – Catarina Nogueira Nov 10 '19 at 19:10
Most applications I've seen actually don't decay, and keep a fairly small epsilon (like `0.05` throughout training, and sometimes when applying the policy as well. – francoisr Nov 10 '19 at 19:41
1

But if you want to start from a larger epsilon, then decaying it is a good idea because otherwise you never fully exploit and stabilize the policy you're learning. You don't want to set it to zero, but decaying to a small value is good in most cases. Note that there is usually some degree of flexibility with respect to the exact value of epsilon: setting different value might allow to converge to similar policies. The point is setting a value to small will get the agent stuck in local minima because it doesn't explore enough, and setting it too high will prevent it from learning anything. – francoisr Nov 10 '19 at 19:42

Qlearning Epsilon-greedy exploration: Epsilon decay X fixed

1 Answers1