Gradient Descent is optimization algorithm for finding the minimum of a function.
Very simplified view
Lets start with a 1D function y = f(x)
Lets start at an arbitrary value of x and find the gradient (slope) of f(x).
If the slope is decreasing at x then it means we have to go further toward (right of number line) x (for reaching the minimum)
If the slope is increasing at x then it means we have to go away from (left of number line) x
We can get the slope by taking the derivative of the function. The derivative is -ve if the slop is decreasing and +ve if the slope is increasing
So we can start at some arbitrary value of x and slowly move toward the minimum using the derivatives at that value of x. How slowly we are moving is determined by the learning rate or step size. so we have the update rule
x = x - df_dx*lr
We can see that if the slope is decreasing the derivative (df_dx) is -ve and x is increasing and so x is moving to further right. On the other hand if slope is increasing the df_dx is +ve which decreases x and so we are moving toward left.

We continue this either for some large number of times or until the derivative is very small
Multivariate function z = f(x,y)
The same logic as above applies except now we take the partial derivatives instead of derivative.
Update rule is
x = x - dpf_dx*lr
y = y - dpf_dy*lr
Where dpf_dx is the partial derivative of f with respect to x
The above algorithm is called the gradient decent algorithm. In Machine learning the f(x,y) is a cost/loss function whose minimum we are interested in.
Example
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d.axes3d import Axes3D
from pylab import meshgrid
from scipy.optimize import fmin
import math
def z_func(a):
x, y = a
return ((x-1)**2+(y-2)**2)
x = np.arange(-3.0,3.0,0.1)
y = np.arange(-3.0,3.0,0.1)
X,Y = meshgrid(x, y) # grid of point
Z = z_func((X, Y)) # evaluation of the function on the grid
fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1,linewidth=0, antialiased=False)
plt.show()
The min of z_func is at (1,2). This can be verified using the fmin function of scipy
fmin(z_func,np.array([10,10]))
Now lets write our own gradient decent algorithm to find the min of z_func
def gradient_decent(x,y,lr):
while True:
d_x = 2*(x-1)
d_y = 2*(y-2)
x -= d_x*lr
y -= d_y*lr
if d_x < 0.0001 and d_y < 0.0001:
break
return x,y
print (gradient_decent(10,10,0.1)
We are starting at some arbitrary value x=10 and y=10 and a learning rate of 0.1. The above code prints 1.000033672997724 2.0000299315535326
which is correct.
So if you have a continuous differentiable convex function, to find its optimal (which is minimal for a convex) all you have to do is find the partial derivatives of the function with respect to each variable and use the update rule mentioned above. Repeat the steps until the gradients are small which mean we have reached the minima for a convex function.
If the function is not convex, we might get stuck in a local optima.