Why the learning rate for Q-learning is important for stochastic environments?

Question

As stated in the Wikipedia https://en.wikipedia.org/wiki/Q-learning#Learning_Rate, for a stochastic problem, using the learning rate is important for convergence. Although I tried to find the "intuition" behind the reason without any mathematical proof, I could not find it.

Specifically, it is difficult for me to understand why updating q-values slowly is beneficial for a stochastic environment. Could anyone please explain the intuition or motivation?

[Artificial Intelligence Stack Exchange](https://ai.stackexchange.com/) is probably a better place to ask theoretical questions related to reinforcement learning, so I suggest that you ask your question there. If you ask it there, please, delete it from here (to avoid cross-posting, which is generally discouraged). Your current question would be off-topic for Stack Overflow, given this is not even a programming question. — nbro, Nov 14 '20 at 02:33

score 1 · Accepted Answer · answered Nov 13 '20 at 07:12

After you get close enough to convergence, a stochastic environment would make it impossible to converge if the learning rate is too high.

Think of it like a ball rolling into a funnel. The speed at which the ball is rolling is like the learning rate. Because it's stochastic, the ball will never directly go into the hole, it will always just miss it. Now, if the learning rate is too high, then just missing is disastrous. It will shoot right past the hole.

That is why you want to steadily decrease the learning rate. It is like the ball losing velocity due to friction, which will always allow it to drop into the hole no matter which direction it's coming from.

Why the learning rate for Q-learning is important for stochastic environments?

1 Answers1