1

I'm using:

sklearn.cross_validation.cross_val_score

to make a cross validation and get the results of each run.

The output of this function is the scores.

Is there a method to get the folds (partitions) themselves that are partitioned internally in the cross_val_score function?

eman
  • 195
  • 1
  • 1
  • 8

2 Answers2

2

The default cross validator for cross_val_score is a StratifiedKFold with K=3 for classification. You can get a cross validation iterator instead, by using the StratifiedKFold and looping over the splits as shown in the example.

Kyle Kastner
  • 1,008
  • 8
  • 7
  • yes, I know I can do that but cross_val_score save lots of effort, so I'm looking for a method to extract the folds while using it. – eman Jul 08 '14 at 09:15
2

There isn't a way to extract the internal cross validation splits used in the cross_val_score, as this function does not expose any state about it. As mentioned in the documentation, either a k-fold or stratified k-fold with k=3 will be used.

However, if you need to keep track of the cross validation splits used, you can explicitly pass in the cv argument of cross_val_score by creating your own cross validation iterators:

from sklearn.cross_validation import KFold, cross_val_score
from sklearn.datasets import load_iris
from sklearn.svm import SVC

iris = load_iris()
kf = KFold(len(iris.target), 5, random_state=0)
clf = SVC(kernel='linear', C=1)
scores = cross_val_score(clf, iris.data, iris.target, cv=kf)

so that it uses the splits you specified exactly instead of rolling its own.

YS-L
  • 14,358
  • 3
  • 47
  • 58
  • I want to make sure of something, is there any randomness in splitting the partitions? or the same partitions is generated for the same kfold settings and target labels? – eman Jul 09 '14 at 11:06
  • 1
    There is randomness in the splits controlled by the argument ``random_state``. If you do not specify that, ``numpy``'s internal random state will be used and results over multiple runs might be different. For the case of ``KFold``, randomness only applied if ``shuffle=True``. – YS-L Jul 09 '14 at 11:19
  • in StratifiedKFold there is no parameter called "shuffle" or "random_state", so what's the case? – eman Jul 09 '14 at 11:53
  • At least as of [sklearn version 0.18](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html#sklearn.model_selection.StratifiedKFold), it does have those parameters. – Clay Dec 30 '16 at 18:20