2

I have a training dataset (text) for a particular category (say Cancer). I want to train a SVM classifier for this class in weka. But when i try to do this by creating a folder 'cancer' and putting all those training files to that folder and when i run to code i get the following error: weka.classifiers.functions.SMO: Cannot handle unary class!

what I want to do is if the classifier finds a document related to 'cancer' it says the class name correctly and once i fed a non cancer document it should say something like 'unknown'.

What should I do to get this behavior?

samsamara
  • 4,630
  • 7
  • 36
  • 66

1 Answers1

7

The SMO algorithm in Weka only does binary classification between two classes. Sequential Minimal Optimization is a specific algorithm for solving an SVM and in Weka this a basic implementation of this algorithm. If you have some examples that are cancer and some that are not, then that would be binary, perhaps you haven't labeled them correctly.

However, if you are using training data which is all examples of cancer and you want it to tell you whether a future example fits the pattern or not, then you are attempting to do one-class SVM, aka outlier detection.

LibSVM in Weka can handle one-class svm. Unlike the Weka SMO implementation, LibSVM is a standalone program which has been interfaced into Weka and incorporates many different variants of SVM. This post on the Wekalist explains how to use LibSVM for this in Weka.

karenu
  • 3,016
  • 1
  • 15
  • 11
  • Thanks for your answer. I got it working. What is the difference between having two sets (say cancer and non_cancer) and doing a binary classification and having a single set (only cancer) and doing a one class classification, if I only want to determine how much documents relevant to cancer (one class)? – samsamara May 01 '12 at 15:38
  • The difference is whether you have negative examples. If you have 100 cancer examples and 100 non-cancer examples, then that's two classes, cancer and non-cancer. If you only have say 100 healthy examples and want to know if anything is abnormal, then that's one-class. – karenu May 02 '12 at 15:19
  • I did a one class training with the LibSVM in Weka. But the problem is during testing, all the test instances are classified to the class I used in training, it doesn't say at least a single instance irrelevant which I know for sure that the testing instances are totally irrelevant from that class. What could be the reason for this? – samsamara May 02 '12 at 18:40
  • Did you do Parameter Tuning? SVM is very sensitive to its parameters, its not an 'out of the box' solution. This document from libsvm is a great introduction: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf – karenu May 02 '12 at 18:46
  • A couple other things to check - if you have two clases, did the training file include the class attribute and/or examples from both classes? If you chose one-class SVM and just fed it a file with two classes it may have considered the class variable to be an attribute and included it in the model, in which case if you feed it anything with either value for that attribute its going to consider it part of the class. – karenu May 02 '12 at 18:48
  • Thanks. Now I'm using "double pred = myClassi.classifyInstance(myInstance);" to get the predicted value. I tested with 6 instances from the same training set and now I get 4 instances as NaN and 2 as 0.0. I guess here NaN = outlier and 0.0 = relevant right?. But how do I get 4 outliers? it's the same instances I used for training. What could be the reason here? I didn't do any parameter tuning, only -S 2 to say unary classification. – samsamara May 03 '12 at 13:19
  • @KillBill I have the same issue. All the test instances are classified to the class I used in training. What did you do to solve it? – xro7 Aug 19 '16 at 13:45
  • @karenu Link to the post which explains how to use LibSVM for unary class classification down/permanently moved. Can you please check & provide us new link. – drp Sep 07 '16 at 10:35