1

I am working with this dataset and I want to plot the training error as a function of the number of iterations of gradient descent with square loss. Here is my attempt :

import numpy as np
import scipy.io
import matplotlib.pyplot as plt
mat = scipy.io.loadmat('data_orsay_2017.mat')
x=mat['Xtrain']
y=mat['ytrain']
                
def gradientdescent(x,y,n,alpha,max_iterations):  # n is the sample size, alpha is the learning rate
    d = x.shape[1] # dimension of the data
    theta = np.random.random(d) 
    error = []  
    for j in range(max_iterations):
        prediction = x.dot(theta)
        cost = 1/(2*n)*sum((y[i,0]-prediction[i])**2 for i in range(n))
        error.append(cost)
        grad = (1/n) * sum((prediction[i] - y[i,0])*x[i] for i in range(n)) 
        theta-=alpha*grad 
    return (theta,error)
       
plt.xlabel("iterations")
plt.ylabel("training error for square loss")
plt.plot(range(1000),gradientdescent(x,y,1000,1,1000)[1])
plt.show()

However it seems that the cost does not converge to zero (here is an example with 1000 iterations, it doesn't go below 0.27) Is there something wrong with my algorithm ?

Skywear
  • 53
  • 4
  • Is the training set that you are using perfectly seperable? If not then you will never reach 0-cost using any training method. – subspring Feb 26 '22 at 18:06
  • That's reassuring to know that it doesn't have to always converge to zero, thanks. I have no clue if this dataset is "perfectly separable" though, I only have the data – Skywear Feb 26 '22 at 19:11
  • One more thing to know, even if the "training" data is perfectly separable, that doesn't mean it is good to have 0-cost model, because this means that the model could be overfit to the training data, and when testing the model you might find that you are doing poor even though you have 0-cost. So you should consider to do cross-validation to reduce the probability of that. – subspring Feb 27 '22 at 06:46

0 Answers0