I have a problem with understanding how KFold Cross-Validation works in the new model selection version. I am using Naive Bayes classifier and I would like to test it using cross-validation. My test and train data are split like this:
test_set = posRevBag[:200] + negRevBag[:200]
train_set = posRevBag[200:] + negRevBag[200:]
and they are represented like ({'one': True, 'last': True...},pos)
.
I know that in the old cross-validation I would have had something like:
cv = cross_validation.KFold(len(train_set), n_folds=10, indices=True, shuffle=False, random_state=None, k=None)
for traincv, testcv in cv:
classifier = nltk.NaiveBayesClassifier.train(train_set[traincv[0]:traincv[len(traincv)-1]])
print 'accuracy:', nltk.classify.util.accuracy(classifier, train_set [testcv[0]:testcv[len(testcv)-1]])
For the new cross-validation I saw that it doesn't take the length of the training set anymore and also it uses a split function which I'm not quite familiar with since I split my test and train set manually as seen above.