0

I'm looking for the easiest way to teach my students how to perform 10CV, for standard classifiers in sklearn such as logisticregression, knnm, decision tree, adaboost, svm, etc.

I was hoping there was a method that created the folds for them instead of having to loop like below:

from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.metrics import accuracy_score

sss = StratifiedShuffleSplit(n_splits=1, test_size=0.1, random_state=0)

X=df1.drop(['Unnamed: 0','ID','target'],axis=1).values
y=df1.target.values


for train_index, test_index in sss.split(X,y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    clf = LogisticRegressionCV()
    clf.fit(X_train, y_train)
    train_predictions = clf.predict(X_test)
    acc = accuracy_score(y_test, train_predictions)
    print(acc)

Seems like there should be an easier way.

dorien
  • 5,265
  • 10
  • 57
  • 116
  • 1
    Does this answer your question? [How to correctly perform cross validation in scikit-learn?](https://stackoverflow.com/questions/55270431/how-to-correctly-perform-cross-validation-in-scikit-learn) – PV8 Oct 02 '20 at 05:14
  • the thing you are looking for its called: cross validation and there are several questions regarding that topic around here – PV8 Oct 02 '20 at 05:14
  • I know it's cross validation. I'm looking for the most concise function in python that implements it. – dorien Oct 02 '20 at 09:02
  • @PV8 not a bad solution! – dorien Oct 02 '20 at 09:03

1 Answers1

1

I think your question is, whether there is an already existing method for 10-fold cross validation. So to answer it, there is the sklearn documentation, which explains cross validation and also how to use it:

Besides that, you can also make use of the sklearn modules for cross validation

To include a code example, which should work with your code, import the required library

from sklearn.model_selection import cross_val_score

and add this line instead of your loop:

print(cross_val_score(clf, X, y, cv=10))

And your n_splits is just set to 1 by the way, so its 1-fold and not 10-fold in your code.

Kim Tang
  • 2,330
  • 2
  • 9
  • 34
  • Thank you this is helpful. How would I include normalization though, e.g. MinMaxScaler()? (This would have to be done on train and test folds separately) – dorien Oct 05 '20 at 09:30