How to get the folds themselves that are partitioned internally in sklearn.cross_validation.cross_val_score?

Question

I'm using:

sklearn.cross_validation.cross_val_score

to make a cross validation and get the results of each run.

The output of this function is the scores.

Is there a method to get the folds (partitions) themselves that are partitioned internally in the cross_val_score function?

score 2 · Answer 1 · answered Jul 08 '14 at 08:00

2

The default cross validator for cross_val_score is a StratifiedKFold with K=3 for classification. You can get a cross validation iterator instead, by using the StratifiedKFold and looping over the splits as shown in the example.

answered Jul 08 '14 at 08:00

Kyle Kastner

1,008
8
7

yes, I know I can do that but cross_val_score save lots of effort, so I'm looking for a method to extract the folds while using it. – eman Jul 08 '14 at 09:15

YS-L · Accepted Answer · 2014-07-08T09:41:23.980

2

There isn't a way to extract the internal cross validation splits used in the cross_val_score, as this function does not expose any state about it. As mentioned in the documentation, either a k-fold or stratified k-fold with k=3 will be used.

However, if you need to keep track of the cross validation splits used, you can explicitly pass in the cv argument of cross_val_score by creating your own cross validation iterators:

from sklearn.cross_validation import KFold, cross_val_score
from sklearn.datasets import load_iris
from sklearn.svm import SVC

iris = load_iris()
kf = KFold(len(iris.target), 5, random_state=0)
clf = SVC(kernel='linear', C=1)
scores = cross_val_score(clf, iris.data, iris.target, cv=kf)

so that it uses the splits you specified exactly instead of rolling its own.

edited Jul 08 '14 at 09:41

answered Jul 08 '14 at 09:35

YS-L

14,358
3
47
58

I want to make sure of something, is there any randomness in splitting the partitions? or the same partitions is generated for the same kfold settings and target labels? – eman Jul 09 '14 at 11:06
1

There is randomness in the splits controlled by the argument ``random_state``. If you do not specify that, ``numpy``'s internal random state will be used and results over multiple runs might be different. For the case of ``KFold``, randomness only applied if ``shuffle=True``. – YS-L Jul 09 '14 at 11:19
in StratifiedKFold there is no parameter called "shuffle" or "random_state", so what's the case? – eman Jul 09 '14 at 11:53
At least as of [sklearn version 0.18](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html#sklearn.model_selection.StratifiedKFold), it does have those parameters. – Clay Dec 30 '16 at 18:20

How to get the folds themselves that are partitioned internally in sklearn.cross_validation.cross_val_score?

2 Answers2