-2

Hi guys I am trying to perform K-Fold cross validation on this insurance dataset but I trying to use a for loop to iterate over an array of integers. The output gives me the following error :

ValueError: The number of folds must be of Integral type. [3, 4, 5, 6, 7, 8, 9, 10, 11, 12] of type <class 'list'> was passed.

Can someone please explain what this error is about and how to resolve it. Below is my code for K fold cross validation.

import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import VarianceThreshold,mutual_info_classif,mutual_info_regression
from sklearn.feature_selection import SelectKBest, SelectPercentile

data_ = pd.read_csv("insurance.csv") 
print(data_.head())

# Create dummies

data_dummies= pd.get_dummies(data_, columns = ['sex','region','smoker'])
print(data_dummies.head())
data_dummies = pd.DataFrame(data_dummies)
data_cleaned = data_dummies.drop(['sex_female','region_southwest','smoker_no'],axis = 'columns')
X=  data_cleaned.drop(['charges'], axis = 'columns')
y = data_cleaned['charges']
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state = 0)

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
ind = [3,4,5,6,7,8,9,10,11,12]


for  i in ind:


     kfold = KFold(n_splits=ind,shuffle=True,random_state=0)
     model = LinearRegression()
     scores = cross_val_score(model,X,y,cv=kfold,scoring='neg_mean_squared_error')

print(scores)
Emer
  • 3,734
  • 2
  • 33
  • 47
Murad24
  • 1
  • 2

1 Answers1

1

You're passing the whole list ind in the n_splits argument of KFold , instead of iterating through its elements i, which is your intention. Also, why allocate a list? Wouldn't you want to do

for i in range(3, 13):
    kfold = KFold(n_splits=i,shuffle=True,random_state=0)
    model = LinearRegression()
    scores = cross_val_score(model,X,y,cv=kfold,scoring='neg_mean_squared_error')
    print(scores)
Caio Rocha
  • 332
  • 2
  • 11
  • Can you not use for loop in case of while. But your output gives an 1D array. – Murad24 Apr 02 '20 at 19:16
  • I am trying to find the error for K=3, K=4,...K=12. For each K there should K outputs – Murad24 Apr 02 '20 at 19:19
  • As you see, departing from the exact context of a question (and without any actual gain, since your `while` loop is practically the same with the `for` loop + list used by OP) can easily lead to miscommunication issues, and it should be avoided when not absolutely necessary. – desertnaut Apr 03 '20 at 13:56
  • 1
    @desertnaut I had no intent to diverge from the original question, and what you said is totally valid, so I've updated my answer. Thank you for the observation. – Caio Rocha Apr 04 '20 at 03:47