Implementing naive gradient descent in python

Question

I'm trying to implement a very naive gradient descent in python. However, it looks like it goes into an infinite loop. Could you please help me debug it?

y = lambda x : x**2
dy_dx = lambda x : 2*x
def gradient_descent(function,derivative,initial_guess):
    optimum = initial_guess
    while derivative(optimum) != 0:
        optimum = optimum - derivative(optimum)
    else:
        return optimum
gradient_descent(y,dy_dx,5)

Edit:

Now I have this code, I really can't comprehend the output. P.s. It might freeze your CPU.

y = lambda x : x**2
dy_dx = lambda x : 2*x
def gradient_descent(function,derivative,initial_guess):
    optimum = initial_guess
    while abs(derivative(optimum)) > 0.01:
        optimum = optimum - 2*derivative(optimum)
        print((optimum,derivative(optimum)))
    else:
        return optimum
gradient_descent(y,dy_dx,5)

Now I'm trying to apply it to a regression problem, however the output doesn't appear to be correct as shown in the output below:

Output of gradient descent code below

import matplotlib.pyplot as plt
def stepGradient(x,y, step):
    b_current = 0 
    m_current = 0
    b_gradient = 0
    m_gradient = 0
    N = int(len(x))   
    for i in range(0, N):
        b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
        m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
    while abs(b_gradient) > 0.01 and abs(m_gradient) > 0.01:
        b_current = b_current - (step * b_gradient)
        m_current = m_current - (step * m_gradient)
        for i in range(0, N):
            b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
            m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
    return [b_current, m_current]

x = [1,2, 2,3,4,5,7,8]
y = [1.5,3,1,3,2,5,6,7]
step = 0.00001
(b,m) = stepGradient(x,y,step)


plt.scatter(x,y)
abline_values = [m * i + b for i in x]
plt.plot(x, abline_values, 'b')
plt.show()

Fixed :D

import matplotlib.pyplot as plt
def stepGradient(x,y):
    step = 0.001
    b_current = 0 
    m_current = 0
    b_gradient = 0
    m_gradient = 0
    N = int(len(x))   
    for i in range(0, N):
        b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
        m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
    while abs(b_gradient) > 0.01 or abs(m_gradient) > 0.01:
        b_current = b_current - (step * b_gradient)
        m_current = m_current - (step * m_gradient)
        b_gradient= 0
        m_gradient = 0
        for i in range(0, N):
            b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
            m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
    return [b_current, m_current]

x = [1,2, 2,3,4,5,7,8,10]
y = [1.5,3,1,3,2,5,6,7,20]
(b,m) = stepGradient(x,y)


plt.scatter(x,y)
abline_values = [m * i + b for i in x]
plt.plot(x, abline_values, 'b')
plt.show()

The thing with gradient descent is that it very rarely reaches a derivative of 0. The process works fine when the gradient is high, but as it reaches small changes it has shown that the process will be circling around the optimal point. Try writing a limit to the while loop or make the derivative greater than a small epsilon value like 0.0001. — Dinesh.hmn, Dec 17 '16 at 20:43
What do you mean by "the output doesn't appear to be correct"? Show the expected output and the output you actually get (console output, tracebacks, graph plot, etc.). The more detail you provide, the better answers you are likely to receive. Check the [FAQ](http://stackoverflow.com/tour) and [How to Ask](http://stackoverflow.com/help/how-to-ask). — Rory Daulton, Feb 20 '17 at 11:32

Rory Daulton · Answer 1 · 2016-12-17T21:32:54.457

Your while loop stops only when a calculated floating-point value equals zero. This is naïve, since floating-point values are rarely calculated exactly. Instead, stop the loop when the calculated value is close enough to zero. Use something like

while math.abs(derivative(optimum)) > eps:

where eps is the desired precision of the calculated value. This could be made another parameter, perhaps with a default value of 1e-10 or some such.

That said, the problem in your case is worse. Your algorithm is far too naïve in assuming that the calculation

optimum = optimum - 2*derivative(optimum)

will move the value of optimum closer to the actual optimum value. In your particular case, the variable optimum just cycles back and forth between 5 (your initial guess) and -5. Note that the derivative at 5 is 10 and the derivative at -5 is -10.

So you need to avoid such cycling. You could multiply your delta 2*derivative(optimum) by something smaller than 1, which would work in your particular case y=x**2. But this will not work in general.

To be completely safe, 'bracket' your optimum point with a smaller value and a larger value, and use the derivative to find the next guess. But ensure that your next guess does not go outside the bracketed interval. If it does, or if the convergence of your guesses is too slow, use another method such as bisection or golden mean search.

Of course, this means your 'very naïve gradient descent' algorithm is too naïve to work in general. That's why real optimization routines are more complicated.

Thanks. I just tried that, but the loop keeps on going still. — Ahmad Abdelzaher, Dec 17 '16 at 20:53
Sorry I thought ppl would just run the code, I will update it with a graph shortly. — Ahmad Abdelzaher, Feb 20 '17 at 11:49

score 0 · Answer 2 · answered Dec 17 '16 at 21:26

0

You also need to decrease your step size (gamma in the gradient descent formula):

y = lambda x : x**2
dy_dx = lambda x : 2*x
def gradient_descent(function,derivative,initial_guess):
    optimum = initial_guess
    while abs(derivative(optimum)) > 0.01:
        optimum = optimum - 0.01*derivative(optimum)
        print((optimum,derivative(optimum)))
    else:
        return optimum

answered Dec 17 '16 at 21:26

rofls

4,993
3
27
37

thanks, the algorithm works, but return doesn't work, how can i make the function return the final optimum – Ahmad Abdelzaher Dec 17 '16 at 22:01

Implementing naive gradient descent in python

2 Answers2