Linear regression and gradient descent from scratch python

Question

I am trying to run the following linear regression from scratch code. When I create my object for my linear regression class and call my method, I am getting a type error.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('/Users/MyName/Downloads/archive/prices.csv')
X = df['volume'].values
y = df['close'].values

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

class Lin_Reg():
    def __init__(self, lr=0.01, n_iters=10000):
        self.lr = lr
        self.n_iters = n_iters
        self.weights = None
        self.bias = None
        
    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        for _ in range(self.n_iters):
            y_pred = np.dot(X, self.weights) + self.bias

            dw = (1/n_samples) * np.dot(X, (y_pred - y))
            db = (1/n_samples) * np.sum(y_pred-y)

            self.weight = self.weight -self.lr * dw
            self.bias = self.bias -self.lr * db
    
    def predict(self, X):
        y_pred = np.dot(X, self.weights) + self.bias
        return y_pred

reg = Lin_Reg()
reg.fit(X_train, y_train)
predictions = reg.predict(X_test)

The error message is

ValueError: not enough values to unpack (expected 2, got 1)

and the line generating this error is n_samples, n_features = X.shape

The dataset I'm working with can be found here: https://www.kaggle.com/datasets/dgawlik/nyse. I am using the prices.csv file.

If you call `predict` without first calling `fit`, then `self.weights == self.bias == None` - and so the line `y_pred = np.dot(X, self.weights) + self.bias` throws the error you're seeing. — slothrop, Mar 10 '23 at 20:55
@slothrop I've fixed that. I've edited the question to show what my current issue is now. — Gustavo, Mar 12 '23 at 01:37

score 0 · Answer 1 · answered Mar 12 '23 at 01:45

0

The problem is at this line:

X = df['volume'].values

This only gives you a single column, which has shape (N,) where N is the number of rows. Because it is a single-value tuple, this line raises an error:

n_samples, n_features = X.shape

In your case, you can just do:

n_samples, n_features = len(X), 1

answered Mar 12 '23 at 01:45

Minh-Long Luu

2,393
1
17
39

I did that and now I'm getting the error "ValueError: shapes (766137,) and (1,) not aligned: 766137 (dim 0) != 1 (dim 0) " and it is pointing at the line of code "y_pred = np.dot(X, self.weights) + self.bias." – Gustavo Mar 12 '23 at 15:02

Linear regression and gradient descent from scratch python

1 Answers1