2

In sklearn, in order to train the data using fit method in linear regression, we have to reshape the 1D arrays. But, in the case of linear regression with multiple variables, I got the output without reshaping of the target variable.

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

df = pd.read_csv("Weather.csv",low_memory = False) # the data set I used
df1 = df[["Precip","MaxTemp"]]

reg = LinearRegression().fit(df1.head(),df.MinTemp.head()) # no error with shape of df1 is (5,2) and shape of df.MinTemp.head() is (5,)

Can I know the reason behind this? Thanks in advance.

  • 1
    Fit is taking df1.head() which is X (features) and df.MinTemp.head() as Y(value) ,can you show us where exactly error occurs please have a look and show your error https://stackoverflow.com/questions/45704226/what-does-fit-method-in-scikit-learn-do – nithin Nov 15 '19 at 10:23

1 Answers1

2

please look at this example

from sklearn.linear_model import LinearRegression
import numpy as np

Xr = np.random.randint(0,10,4) # random 1D array with 0 columns -> shape is (4,)

X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]]) # random array which shape is (4,2)
y = np.dot(X, np.array([1, 2])) + 3 # 1D array  with 0 columns -> shape is (4,)

regr = LinearRegression()
# regr.fit(Xr,y) this would raise an exception
regr.fit(X,y)
regr.predict(X) # returns array([ 6.,  8.,  9., 11.])

if you try to fit model using X and y.. you will not get error even if y shape is (4,).it is because your X has the shape of (4,2).so sklearn will cast your targets to X’s dtype if necessary.

if you try to fit Xr and y.both are 1D arrays with 0 columns. You will get error saying Expected 2D array, got 1D array instead:.In this case at least your Xr training data should be a 1D array with 1 column.Then sklearn will do the rest for you by casting target looking at Xr shape.

read more about fit method and what it does here

Jónás Balázs
  • 781
  • 10
  • 24
Rajith Thennakoon
  • 3,975
  • 2
  • 14
  • 24