2

I used the following block of code and I got a traceback error;

Code (in the code below, X_train and y_train are data series (a single column of data)):

from sklearn.linear_model import LinearRegression
regressor = LinearRegression(fit_intercept=True)
regressor.fit(X_train, y_train)

Error:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-167-3392c2ad36e2> in <module>
      2 from sklearn.linear_model import LinearRegression
      3 regressor = LinearRegression(fit_intercept=True)#Instantiating an object of the LinearRegression class.#"fit_intercept = True" is asking the linear regressor to assume that there is a y-intercept.
----> 4 regressor.fit(X_train, y_train) #Passing in our training data

~\Anaconda3\lib\site-packages\sklearn\linear_model\base.py in fit(self, X, y, sample_weight)
    461         n_jobs_ = self.n_jobs
    462         X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'],
--> 463                          y_numeric=True, multi_output=True)
    464 
    465         if sample_weight is not None and np.atleast_1d(sample_weight).ndim > 1:

The code works after I changed X_train and y_train to dataframes with the following syntax; X = pd.DataFrame(IceCream.Temperature) and y = pd.DataFrame(IceCream.Revenue) The thing is that I do not know why this works but not the data series. I am taking a course on Machine Learning from SuperDataScience.com and the block of code at the top of this question worked for the instructor without having to convert the data series to dataframes. Any help will be greatly appreciated.

  • Maybe the `check_X_y` function throws the exception because your `X_train` is not 2d. Check with `X_train.shape` and you will see something like `(n,)` where `n` is the length of `X_train`. When you convert `X_train` to dataframe, it becomes 2D (shape `(n,1)`) and passes the validation. – ATL Jan 07 '20 at 03:26
  • just convert the series using np.array(series) then put it into the model. – Kallol Jan 07 '20 at 06:33

1 Answers1

1

Documentation from SKLearn on LinearRegression

sklearn.linear_model.LinearRegression

clearly stats that in fit method X : {array-like, sparse matrix} of shape (n_samples, n_features)

A pandas series doesn't fulfill this requirement.

Pulkit Jha
  • 1,709
  • 3
  • 12
  • 18