I'm trying to write a code that return the parameters for ridge regression using gradient descent. Ridge regression is defined as
Where, L is the loss (or cost) function. w are the parameters of the loss function (which assimilates b). x are the data points. y are the labels for each vector x. lambda is a regularization constant. b is the intercept parameter (which is assimilated into w). So, L(w,b) = number
The gradient descent algorithm that I should implement looks like this:
Where ∇ is the gradient of L with respect to w. η
is a step size. t is the time or iteration counter.
My code:
def ridge_regression_GD(x,y,C):
x=np.insert(x,0,1,axis=1) # adding a feature 1 to x at beggining nxd+1
w=np.zeros(len(x[0,:])) # d+1
t=0
eta=1
summ = np.zeros(1)
grad = np.zeros(1)
losses = np.array([0])
loss_stry = 0
while eta > 2**-30:
for i in range(0,len(y)): # here we calculate the summation for all rows for loss and gradient
summ=summ+((y[i,]-np.dot(w,x[i,]))*x[i,])
loss_stry=loss_stry+((y[i,]-np.dot(w,x[i,]))**2)
losses=np.insert(losses,len(losses),loss_stry+(C*np.dot(w,w)))
grad=((-2)*summ)+(np.dot((2*C),w))
eta=eta/2
w=w-(eta*grad)
t+=1
summ = np.zeros(1)
loss_stry = 0
b=w[0]
w=w[1:]
return w,b,losses
The output should be the intercept parameter b, the vector w and the loss in each iteration, losses.
My problem is that when I run the code I get increasing values for w and for the losses, both in the order of 10^13.
Would really appreciate if you could help me out. If you need any more information or clarification just ask for it.
NOTE: This post was deleted from Cross Validated forum. If there's a better forum to post it please let me know.