Gradient Descent algorithm taking long time to complete - Efficiency - Python

Question

I am trying to implement the gradient descent algorithm using python and following is my code,

def grad_des(xvalues, yvalues, R=0.01, epsilon = 0.0001, MaxIterations=1000):
    xvalues= np.array(xvalues)
    yvalues = np.array(yvalues)
    length = len(xvalues)
    alpha = 1
    beta = 1
    converged = False
    i=0
    cost = sum([(alpha + beta*xvalues[i] - yvalues[i])**2 for i in range(length)]) / (2 * length)
    start_time = time.time()
    while not converged:      
        alpha_deriv = sum([(alpha + beta*xvalues[i] - yvalues[i]) for i in range(length)]) / (length)
        beta_deriv =  sum([(alpha + beta*xvalues[i] - yvalues[i])*xvalues[i] for i in range(length)]) / (length)
        alpha = alpha - R * alpha_deriv
        beta = beta - R * beta_deriv
        new_cost = sum( [ (alpha + beta*xvalues[i] - yvalues[i])**2 for i in range(length)] )  / (2*length)
        if abs(cost - new_cost) <= epsilon:
            print 'Converged'
            print 'Number of Iterations:', i
            converged = True
        cost = new_cost
        i = i + 1      
        if i == MaxIterations:
            print 'Maximum Iterations Exceeded'
            converged = True
    print "Time taken: " + str(round(time.time() - start_time,2)) + " seconds"
    return alpha, beta

This code is working fine. But the problem is, it is taking more than 25 seconds for approximately for 600 iterations. I feel this is not efficient enough and I tried converting it to a array before doing the calculations. That did reduce the time from 300 to 25 seconds. Still I feel it can be reduced. Can anybody help me in improving this algorithm?

Thanks

There are various things wrong here but I can't reproduce the specific problem with slowness. What is the nature of your input (xvalues and yvalues)? — Jason S, Feb 16 '16 at 00:17
@JasonS Can I know what are the mistakes ? It is actually a dataframe with 506 values. For now i am using the inbuild boston dataset — haimen, Feb 16 '16 at 00:18
Commented with some potential items. Also, what is the range of inputs? When I put in anything bigger than 20 or so I get overflow errors. — Jason S, Feb 16 '16 at 00:23
I meant the range of the values. If I try it with anything but mostly single-digit numbers it overflows, and otherwise it takes less than a second. — Jason S, Feb 16 '16 at 00:40
I ran 500 values of y=(x-8)^2-3 and it runs in 1.59 seconds with almost 900 iterations. That's a 1D case, is the algorithm meant to handle higher dimensional surfaces? What are alpha and beta supposed to be? — wrkyle, Feb 16 '16 at 00:54
the number of iterations taking for me in around 13000. that is the reason behind taking lot of time. But I thought still we can improve this. — haimen, Feb 16 '16 at 01:00
What are alpha and beta? In your cost function you're using them as parameters of a straight line and calculating the sum of differences. Can you explain alpha, beta, alpha_deriv, and beta_deriv? — wrkyle, Feb 16 '16 at 01:04
alpha and beta coefficients of regression. alpha_deriv and beta_deriv and the derivatives of alpha and beta to minimize it for next iteration. the idea here is to reduce the cost function — haimen, Feb 16 '16 at 01:08
Oh! Are you using gradient descent to find the least squares fit to the data? — wrkyle, Feb 16 '16 at 01:10
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/103550/discussion-between-wrkyle-and-haimen). — wrkyle, Feb 16 '16 at 01:15

score 1 · Answer 1 · answered Feb 16 '16 at 00:22

1

As I commented I can't reproduce the slowness, however here are some potential issues:

It looks like length does not change, but you are repeatedly invoking range(length). In Python 2.x, range creates a list, and doing this repeatedly can slow things down (object creation is not cheap.) Use xrange (or import a Py3-compatible iterator range from six or future) and create the range once up front rather than each time.
i is being reused here in a way that could cause problems. You're trying to use it as the overall iteration count, but each of your list comprehensions that uses i will overwrite i in the scope of the function, which means that the "iteration" count will always end up as length - 1.

answered Feb 16 '16 at 00:22

Jason S

13,538
2
37
42

I've tried what you have said. But it hasn't improved by any means – haimen Feb 16 '16 at 00:34
These suggestions are all I can see to improve the efficiency of *this* algorithm (gradient descent to find the least squares coefficients for a straight line). These types of algorithms are just slow and don't start to pull ahead of other methods until you have ugly, high-dimensional problems. I think this answer is the best answer you're going to get. The real bottleneck is the fact that you have two sums over N in each step but that can hardly be helped. – wrkyle Feb 16 '16 at 01:33

score 0 · Accepted Answer · answered Feb 26 '17 at 20:49

The lowest hanging fruit that I can see is in vectorization. You have a lot of list comprehensions; they're faster than for loops but have nothing on proper usage of numpy arrays.

def grad_des_vec(xvalues, yvalues, R=0.01, epsilon=0.0001, MaxIterations=1000):
    xvalues = np.array(xvalues)
    yvalues = np.array(yvalues)
    length = len(xvalues)
    alpha = 1
    beta = 1
    converged = False
    i = 0
    cost = np.sum((alpha + beta * xvalues - yvalues)**2) / (2 * length)
    start_time = time.time()
    while not converged:
        alpha_deriv = np.sum(alpha + beta * xvalues - yvalues) / length
        beta_deriv = np.sum(
            (alpha + beta * xvalues - yvalues) * xvalues) / length
        alpha = alpha - R * alpha_deriv
        beta = beta - R * beta_deriv
        new_cost = np.sum((alpha + beta * xvalues - yvalues)**2) / (2 * length)
        if abs(cost - new_cost) <= epsilon:
            print('Converged')
            print('Number of Iterations:', i)
            converged = True
        cost = new_cost
        i = i + 1
        if i == MaxIterations:
            print('Maximum Iterations Exceeded')
            converged = True
    print("Time taken: " + str(round(time.time() - start_time, 2)) + " seconds")
    return alpha, beta

For comparison

In[47]: grad_des(xval, yval)
Converged
Number of Iterations: 198
Time taken: 0.66 seconds
Out[47]: 
(0.28264882215511067, 0.53289263416071131)

In [48]: grad_des_vec(xval, yval)
Converged
Number of Iterations: 198
Time taken: 0.03 seconds
Out[48]: 
(0.28264882215511078, 0.5328926341607112)

That's about a factor 20 speed up (xval and yval were both 1024 element arrays.).

Gradient Descent algorithm taking long time to complete - Efficiency - Python

2 Answers2