0

I am using weka for classification. In weka i am using SMO to classify the documents.In some situation SMO return wrong category.

For example take 2 category Computer and Cricket.First i trained and created model for these 2 category.Then i am going to test a document which contents are related to both category in 50:50 ratio. The SMO returns only the first category computer. If 50:50 ratio means i need to return both category.

How to achieve Multiclass classification in SMO classifier ?

SANN3
  • 9,459
  • 6
  • 61
  • 97
  • Can you give more information about your problem? – Atilla Ozgur Aug 24 '12 at 11:19
  • Your title mentions [multi-class classification](http://en.wikipedia.org/wiki/Multiclass_classification), and the content of your question suggests [multi-label classification](http://en.wikipedia.org/wiki/Multi-label_classification). Those are not the same. – dzieciou Mar 01 '13 at 22:38

2 Answers2

1

Normally a classifier gives one result. From what I understand your question, you need distributionForInstance. This method will give you probability for classes. In your example your should get 1/2 and 1/2 for probabilities.

You mention

Yes from that method only i am getting wrong probability . For computer class i am getting 0.63 and for 0.36 for cricket. But content and number of words is equal for both categories

problem with your interpretation is that you expect class probabilities to come from only content and number of words. This is true for example for Naive Bayes but not in general for other classifiers. If you try same classification with naive bayes, you may see your expected probabilities.

Your class probabilities are given according to support vectors in SVM. This means that your 0.63 probability class is more probable according to support vectors.

Atilla Ozgur
  • 14,339
  • 3
  • 49
  • 69
  • Yes from that method only i am getting wrong probability . For computer class i am getting 0.63 and for 0.36 for cricket. But content and number of words is equal for both categories. – SANN3 Aug 24 '12 at 13:27
  • ok thanks for your suggestion. But i already tried other classifiers. For svm only i getting good results for single class classification. I need a good classifier for multiclass classification. – SANN3 Aug 25 '12 at 16:02
0

I know people may use different terminologies, but the most commonly accepted term for your problem is "Multilabel classification" (https://en.wikipedia.org/wiki/Multi-label_classification).

I think the wikipedia article mentioning multiclass classification is incorrectly written, or it is the terminology from a different domain that uses similar methods.

Multiclass classification usually means classifying a data point into only one of the many (>2) classes possible, as opposed to multilabel classification which means classifying a data point into more than 1 of the possible classes.

You can look at Meka - an extension of Weka with some multilabel classifiers implemented. I know you want to use weka, but if not, you could try multilabel libsvm.

user1669710
  • 224
  • 1
  • 11