1

I want to use one-class classification using LibSVM in MATLAB.

I want to train data and use cross validation, but I don't know what I have to do to label the outliers.

If for example I have this data:

trainData =  [1,1,1; 1,1,2; 1,1,1.5; 1,1.5,1; 20,2,3; 2,20,2; 2,20,5; 20,2,2];
labelTrainData = [-1 -1 -1 -1 0 0 0 0];  

(The first four are examples of the 1 class, the other four are examples of outliers, just for the cross validation)

And I train the model using this:

model = svmtrain(labelTrainData, trainData , '-s 2 -t 0 -d 3 -g 2.0 -r 2.0 -n 0.5 -m 40.0 -c 0.0 -e 0.0010 -p 0.1 -v 2' );

I'm not sure which value use to label the 1-class data and what to use to the outliers. Does someone knows how to do this?.

Thanks in advance. -Jessica

jessica
  • 379
  • 8
  • 23
  • check the following post. one-class svm, as the name imply, you only have one class in the training set http://stackoverflow.com/questions/14588967/one-class-svm-libsvm – Cici May 14 '13 at 18:55
  • Thanks, however, I still have tne next doubt. So it's not possible to use cross validation on 1-class/outliers?. I used the weka wrapper and if I use instances labeled with '?' they are just ignored in training process. – jessica May 14 '13 at 21:22
  • Also, wich label would it be the correct for the one class? just any number, or should I use strictly -1 or a specific value? – jessica May 14 '13 at 21:24
  • any number should be fine as labels(try using different labels and see if that changes your classifier)... not sure about cross-validation though. – Cici May 15 '13 at 14:51

1 Answers1

0

According to http://www.joint-research.org/wp-content/uploads/2011/07/lukashevich2009Using-One-class-SVM-Outliers-Detection.pdf "Due to the lack of class labels in the one-class SVM, it is not possible to optimize the kernel parameters using cross-validation". However, according to the LIBSVM FAQ that is not quite correct:

Q: How do I choose parameters for one-class SVM as training data are in only one class? You have pre-specified true positive rate in mind and then search for parameters which achieve similar cross-validation accuracy.

Furthermore the README for the libsvm source says of the input data: "For classification, label is an integer indicating the class label ... For one-class SVM, it's not used so can be any number."

I think your outliers should not be included in the training data - libsvm will ignore the training labels. What you are trying to do is find a hypersphere that contains good data but not outliers. If you train with outliers in the data LIBSVM will try yo find a hypersphere that includes the outliers, which is exactly what you don't want. So you will need a training dataset without outliers, a validation dataset with outliers for choosing parameters, and a final test dataset to see whether your model generalizes.

Bull
  • 11,771
  • 9
  • 42
  • 53