In the context of gradient descent algorithms
, (not linear regression), if the cost function is already at a local minimum, what happens next? or how does it reach the global optimum later on?
In the case of linear regression there is only one minimum, and hence no problem of a local minimum. But what about other algorithms that might have a local minimum and a global minimum?