I have been trying to code a gradient descent algorithm from scratch for multi-featured linear regression but when I'm predicting using my own training dataset I'm getting too accurate results.
class gradientdescent:
def fit(self,X,Y):
lr=0.005 *learning rate*
b=0
M=np.array(1)
M=np.arange(X.shape[1])
n=np.size(X,0)
M.fill(1) #initial value for gradient
for i in range(10000):
sum=0
sum1=0
for j in range(n):
sum=sum+(np.dot(X[j],M)+b-Y[j])*X[j]
sum1=sum1+(np.dot(X[j],M)+b-Y[j])
m_gradient=lr*sum/n
b_gradient=lr*sum1/n
M=M-m_gradient
b=b-b_gradient
self.b=b
self.M=M
self.n=n
This dataset I have taken below is too random, I had randomly entered values here in the X and Y array.
X=np.array([[1,2,3,4,5],[2,1,4,3,5],[1,3,2,5,4],[3,0,1,2,4],[0,1,2,4,3]])
Y=np.array([5,6,2,8,100])
my prediction function:
def predict(self,X):
for i in range(self.n):
print(np.dot(self.M,X[i])+self.b)
The predicted values:
5.000000000080892
5.999999999956618
1.9999999999655422
8.000000000004814
99.99999999998795
There is no way the plotted graph passes through the training dataset such closely as the data given was random so I had expected there to be little error. I even tried changing the data but still, it gives me these accurate results.
please tell me if there is any problem with my algorithm.