How to do a gradient descent problem (machine learning)?

Question

could somebody please explain how to do a gradient descent problem WITHOUT the context of the cost function? I have seen countless tutorials that explain gradient descent using the cost function, but I really don't understand how it works in a more general sense.

I am given a 3D function:

z = 3*((1-xx)2) * np.exp(-(xx2) - (yy+1)2) \ - 10*(xx/5 - xx3 - yy5) * np.exp(-xx2 - yy2)- (1/3)* np.exp(-(xx+1)**2 - yy2)

And I am asked to:

Code a simple gradient algorithm. Set the parameters as follows:

learning rate = step size: 0.1
Max number of iterations: 20
Stopping criterion: 0.0001 (Your iterations should stop when your gradient is smaller than the threshold)

Then start your algorithm at

(x0 = 0.5, y0 = -0.5)
(x0 = -0.3, y0 = -0.3)

I have seen this piece of code floating around wherever gradient descent is talked about:

def update_weights(m, b, X, Y, learning_rate):
    m_deriv = 0
    b_deriv = 0
    N = len(X)
    for i in range(N):
        # Calculate partial derivatives
        # -2x(y - (mx + b))
        m_deriv += -2*X[i] * (Y[i] - (m*X[i] + b))

        # -2(y - (mx + b))
        b_deriv += -2*(Y[i] - (m*X[i] + b))

    # We subtract because the derivatives point in direction of steepest ascent
    m -= (m_deriv / float(N)) * learning_rate
    b -= (b_deriv / float(N)) * learning_rate

    return m, b
    enter code here

But I don't understand how to use it for my problem. How does my function fit in there? What do I adjust instead of m and b? I'm very very confused.

Thank you.

How can it work without a cost function? You need some benchmark from one state to the next. This isn't a programming issue so is off-topic here. — roganjosh, Mar 10 '19 at 23:04
"Cost" is arbitrary. You literally just need to assign _some_ value to your solution to benchmark it against the existing best. It need never been a tangible value, just something that discriminates between your existing best solution and a new solution. "Cost" can be read as "How much it violates my criteria". — roganjosh, Mar 10 '19 at 23:07
@noahship, this `m -= (m_deriv / float(N)) * learning_rate b -= (b_deriv / float(N)) * learning_rate` looks like you are calculating the slope of `m` and `b` with Python, so you are on the right track with that piece of code. I know in JavaScript you would set up your learning rate and iterations like: `this.options = Object.assign({ learningRate: 0.1, iterations: 20 }, options);` — Daniel, Oct 20 '19 at 00:32

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

Gradient Descent is optimization algorithm for finding the minimum of a function.

Very simplified view

Lets start with a 1D function y = f(x)

Lets start at an arbitrary value of x and find the gradient (slope) of f(x).

If the slope is decreasing at x then it means we have to go further toward (right of number line) x (for reaching the minimum)
If the slope is increasing at x then it means we have to go away from (left of number line) x

We can get the slope by taking the derivative of the function. The derivative is -ve if the slop is decreasing and +ve if the slope is increasing

So we can start at some arbitrary value of x and slowly move toward the minimum using the derivatives at that value of x. How slowly we are moving is determined by the learning rate or step size. so we have the update rule

x = x - df_dx*lr

We can see that if the slope is decreasing the derivative (df_dx) is -ve and x is increasing and so x is moving to further right. On the other hand if slope is increasing the df_dx is +ve which decreases x and so we are moving toward left.

We continue this either for some large number of times or until the derivative is very small

Multivariate function z = f(x,y)

The same logic as above applies except now we take the partial derivatives instead of derivative. Update rule is

x = x - dpf_dx*lr
y = y - dpf_dy*lr

Where dpf_dx is the partial derivative of f with respect to x

The above algorithm is called the gradient decent algorithm. In Machine learning the f(x,y) is a cost/loss function whose minimum we are interested in.

Example

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d.axes3d import Axes3D
from pylab import meshgrid
from scipy.optimize import fmin
import math

def z_func(a):
 x, y = a
 return ((x-1)**2+(y-2)**2)
 
x = np.arange(-3.0,3.0,0.1)
y = np.arange(-3.0,3.0,0.1)
X,Y = meshgrid(x, y) # grid of point
Z = z_func((X, Y)) # evaluation of the function on the grid

fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1,linewidth=0, antialiased=False)
plt.show()

The min of z_func is at (1,2). This can be verified using the fmin function of scipy

fmin(z_func,np.array([10,10]))

Now lets write our own gradient decent algorithm to find the min of z_func

def gradient_decent(x,y,lr):
    while True:
        d_x = 2*(x-1)
        d_y = 2*(y-2)
        
        x -= d_x*lr
        y -= d_y*lr
        
        if d_x < 0.0001 and d_y < 0.0001:
            break
    return x,y

print (gradient_decent(10,10,0.1)

We are starting at some arbitrary value x=10 and y=10 and a learning rate of 0.1. The above code prints 1.000033672997724 2.0000299315535326 which is correct.

So if you have a continuous differentiable convex function, to find its optimal (which is minimal for a convex) all you have to do is find the partial derivatives of the function with respect to each variable and use the update rule mentioned above. Repeat the steps until the gradients are small which mean we have reached the minima for a convex function.

If the function is not convex, we might get stuck in a local optima.

How to do a gradient descent problem (machine learning)?

1 Answers1

Very simplified view

Lets start with a 1D function y = f(x)

Multivariate function z = f(x,y)

Example