0

I am trying to run a leave-one-one kfold validation on a linear regression model I have but keep getting errors with my script leaving with nan values at the end. x7 is my true values and y7 is my modeled values. Why do I keep getting an error at the end?

 from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
 import numpy as np
 import matplotlib.pyplot as plt
import pandas as pd
x7 =np.array([16.36,24.67,52.31,87.31,3.98,63.45,40.47,35.67,52.12,9.39,57.61,35.77,113.1])
 a=np.reshape(x7, (-1,1))  
 y7 =  np.array([19.678974,4.824257,75.617537,62.587548,40.287506,76.576852,38.777129,29.062245
,50.088907,34.415783,46.466144,44.848378,68.988740])   
b=np.reshape(y7, (-1,1))  
a_train, a_test, b_train, b_test = train_test_split(x7, y7, test_size=12, 
random_state=None)
train_test_split(b, shuffle=True)
kfolds = KFold(n_splits=13, random_state=None)
model = LinearRegression()
score = cross_val_score(model, a, b, cv=kfolds)
print(score)
  • I believe there were errors during fitting. Set `error_score='raise'` in `cross_val_score` and update your question with an error stack. – Sanjar Adilov Feb 18 '22 at 05:16

1 Answers1

0

If you run it, you will see the error:

UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.

When you don't provide the metric, it defaults to the default scorer for LinearRegression, which is R^2. R^2 cannot be calculated for just 1 sample.

In your case, check out the options and decide which one is suitable. one option might be to use RMSE (here it is the negative of RMSE) :

score = cross_val_score(model, a, b, cv=kfolds,scoring ="neg_mean_squared_error")

score

array([ -191.24253413, -1196.96087661,  -849.60502864,   -17.24243385,
        -371.71996402,  -623.67802306,   -21.95720802,  -163.79409063,
          -2.16490531,   -62.32600883,   -29.3290439 ,   -19.44669535,
        -315.64087633])
StupidWolf
  • 45,075
  • 17
  • 40
  • 72