I am teaching an agent to get out of a maze collecting all apples on its way using Qlearning.
I read that is possible to leave a fixed epsilon or to choose an epsilon and decay it as time passes.
I couldn't find the advantages or disadvantages of each approach, I would love to hear more if you can help me understanding which should I use.