I am trying to understand the Gradient Descent Algorithm.
The code here should choose a more optimal line of best fit, given another line of best fit. The function takes the current line-of-best-fit's slope and y-intercept as inputs, as well as a 2-D data set names "points" and a learningRate. This is the code I am working with:
def step_gradient(b_current, m_current, points, learningRate):
b_gradient = 0 #Initialize b_gradient to 0
m_gradient = 0 #Initialize m_gradient to 0
N = float(len(points)) #Let N be the number of data points
for i in range(0, len(points)): #Iterate through dataset "Points"
x = points[i,0]
y = points[i,1]
b_gradient += -(2/N) * (y - ((m_current * x) + b_current)) #gradient is calculated as the derivative
m_gradient += -(2/N) * x * (y - ((m_current * x) + b_current))
new_b = b_current - (learningRate * b_gradient)
new_m = m_current - (learningRate * m_gradient)
return[new_b, new_m]
However I do not understand what is happening inside the for loop.
I understand that the first two lines of the for loop will iteratively assign x and y to the next data point in the data set named "points".
I do not understand how b_gradient and m_gradient are being calculated.
To my understanding, b_gradient is the sum of all partial derivatives with respect to b, for every point in the data set. However, my real question, is how does line:
b_gradient += -(2/N) * (y - ((m_current * x) + b_current))
calculate the partial derivative with respect to b?
What is the -(2/N) for??
Can someone please explain how on earth this line of code represents the partial derivative with respect to b, of a point in this dataset?
Same confusion for m_gradient.