As stated in the Wikipedia https://en.wikipedia.org/wiki/Q-learning#Learning_Rate, for a stochastic problem, using the learning rate is important for convergence. Although I tried to find the "intuition" behind the reason without any mathematical proof, I could not find it.
Specifically, it is difficult for me to understand why updating q-values slowly is beneficial for a stochastic environment. Could anyone please explain the intuition or motivation?