I am trying to use NLTK for training a Naive Bayes classifier for multi-class text classification. But I do not have access to the original texts. I am provided with is a file in SVM Light format (one instance each line with feature:value pair). I simply have to import this file and train and test Naive Bayes classifier using this dataset. I was wondering if there is some way to import this file into NLTK and use it directly for training classifiers.
Asked
Active
Viewed 762 times
1 Answers
2
According to nltk's own documentation this is achieved something like this:
Excerpt from Documentation:
scikit-learn (http://scikit-learn.org) is a machine learning library for Python. It supports many classification algorithms, including SVMs, Naive Bayes, logistic regression (MaxEnt) and decision trees.
This package implement a wrapper around scikit-learn classifiers. To use this wrapper, construct a scikit-learn estimator object, then use that to construct a SklearnClassifier. E.g., to wrap a linear SVM with default settings:
Example:
>>> from sklearn.svm import LinearSVC
>>> from nltk.classify.scikitlearn import SklearnClassifier
>>> classif = SklearnClassifier(LinearSVC())
See: http://www.nltk.org/api/nltk.classify.html#module-nltk.classify.scikitlearn

alvas
- 115,346
- 109
- 446
- 738

James Mills
- 18,669
- 3
- 49
- 62
-
do note that you're using `sklearn` to convert the corpus into svm format and use nltk's wrapper to `sklearn` to call the classifier. – alvas Mar 24 '14 at 06:43