6

The question how the learning rate influences the convergence rate and convergence itself. If the learning rate is constant, will Q function converge to the optimal on or learning rate should necessarily decay to guarantee convergence?

uduck
  • 149
  • 1
  • 1
  • 8
  • 4
    With a sufficiently **small** learning rate you have a convergence guarantee for a convex q learning problem. – Thomas Jungblut Oct 08 '15 at 15:27
  • I assume there is also a dependence on the nature of the MDP. I ASSUME the requirements for convergence on an MDP with stochasticity in state transitions and/or in the reward function will need to satisfy the requirement posted by @purpletentacle. However, I also ASSUME if there is no stochasticity in either the process or the reward the learning rate does not need to decay. Insights from someone who knows (preferably with supporting literature) would be appreciated. – ALM Feb 09 '18 at 17:36

3 Answers3

7

Learning rate tells the magnitude of step that is taken towards the solution.

It should not be too big a number as it may continuously oscillate around the minima and it should not be too small of a number else it will take a lot of time and iterations to reach the minima.

The reason why decay is advised in learning rate is because initially when we are at a totally random point in solution space we need to take big leaps towards the solution and later when we come close to it, we make small jumps and hence small improvements to finally reach the minima.

Analogy can be made as: in the game of golf when the ball is far away from the hole, the player hits it very hard to get as close as possible to the hole. Later when he reaches the flagged area, he choses a different stick to get accurate short shot.

So its not that he won't be able to put the ball in the hole without choosing the short shot stick, he may send the ball ahead of the target two or three times. But it would be best if he plays optimally and uses the right amount of power to reach the hole. Same is for decayed learning rate.

VishalTheBeast
  • 459
  • 3
  • 9
2

The learning rate must decay but not too fast. The conditions for convergence are the following (sorry, no latex):

  • sum(alpha(t), 1, inf) = inf

  • sum(alpha(t)^2, 1, inf) < inf

Something like alpha = k/(k+t) can work well.

This paper discusses exactly this topic:

http://www.jmlr.org/papers/volume5/evendar03a/evendar03a.pdf

Juan Leni
  • 6,982
  • 5
  • 55
  • 87
  • Even-Dar & Mansour 2003 provide sufficient conditions. [Azar et alt., 2011](https://hal.inria.fr/inria-00636615v2/document) contains, among other results, lower bounds on how good an approximation is possible when the learning rate is $\alpha_k = 1/(k+1)$ – VictorZurkowski Jan 28 '19 at 02:01
0

It should decay otherwise there will be some fluctuations provoking small changes in the policy.

Alpha0
  • 35
  • 5