i want to Use 10-fold cross validation to evaluate a nltk classification model. this is the pandas data framework named: data (there are 10k rows and 10 classes)
features: hello_variant, goodbye_variant,wh_question,yesNo_question, conjuction_start, No_of_tokens
i tried below code. but it gives an error
extract_features = data.drop(['class'],axis=1)
documents = data['class']
import nltk
from sklearn import cross_validation
training_set = nltk.classify.apply_features(extract_features, documents)
cv = cross_validation.KFold(len(training_set), n_folds=10, shuffle=False, random_state=None)
for traincv, testcv in cv:
classifier = nltk.NaiveBayesClassifier.train(training_set[traincv[0]:traincv[len(traincv)-1]])
print 'accuracy:', nltk.classify.util.accuracy(classifier, training_set[testcv[0]:testcv[len(testcv)-1]])
error:
> --------------------------------------------------------------------------- ValueError Traceback (most recent call
> last) <ipython-input-253-2ddaf7264527> in <module>()
> 1 import nltk
> 2 from sklearn import cross_validation
> ----> 3 training_set = nltk.classify.apply_features(extract_features, documents)
> 4 cv = cross_validation.KFold(len(training_set), n_folds=10, shuffle=False, random_state=None)
> 5
>
> C:\Users\SampathR\Anaconda2\envs\dato-env\lib\site-packages\nltk\classify\util.pyc
> in apply_features(feature_func, toks, labeled)
> 60 """
> 61 if labeled is None:
> ---> 62 labeled = toks and isinstance(toks[0], (tuple, list))
> 63 if labeled:
> 64 def lazy_func(labeled_token):
>
> C:\Users\SampathR\Anaconda2\envs\dato-env\lib\site-packages\pandas\core\generic.pyc
> in __nonzero__(self)
> 712 raise ValueError("The truth value of a {0} is ambiguous. "
> 713 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
> --> 714 .format(self.__class__.__name__))
> 715
> 716 __bool__ = __nonzero__
>
> ValueError: The truth value of a Series is ambiguous. Use a.empty,
> a.bool(), a.item(), a.any() or a.all().
further i want to get precision, recall, and F-score of each of the dialog act in the corpus(class), and the accuracy and the confusion matrix of the classifier . is there any method available in NLTK to calculate those? (other than sklearn)