2

I am trying to use SVM to do a binary classification from a very high dimensional dataset which is a 3249X40 matrix. I have five similar datasets. While I get results from Decision tree properly(low but different), I get exactly same result from SVM everytime whatever dataset I may use. I use svm in following way:

svmModel = svmtrain(train_mat(trainIdx,:), groups(trainIdx), ...
             'Autoscale',true, 'Showplot',false, 'Method','QP', ...
             'BoxConstraint',2e-1, 'Kernel_Function','rbf', 'RBF_Sigma',1);
            pred = svmclassify(svmModel, train_mat(testIdx,:), 'Showplot',false);

What's wrong with it? I am using Decision tree like this:

         tree=ClassificationTree.fit(train_mat(trainIdx,:),groups(trainIdx,:));
         pred=tree.predict(train_mat(testIdx,:));

I am getting different results(which appear correct as well) from those 5 datasets in decision tree. What's wrong? Is it because SVM cannot handle such datasets that have very few observations compared to number of variables?

carlosdc
  • 12,022
  • 4
  • 45
  • 62
MaxSteel
  • 259
  • 1
  • 6
  • 18

1 Answers1

2

You will probably need to find a combination of C (what you call box constraint) and sigma for the RBF kernel that works well. This is typically done through cross validation. That is separate your trainig data in two, for each combination of box constraint and sigma train on one half and test on the other half, and train on the second half and test on the first half, average both accuracies. Use for testing the box constraint and sigma combination that got the best accuracy.

This question covers many things to check Supprt Vector Machine works in matlab, doesn't work in c++

Community
  • 1
  • 1
carlosdc
  • 12,022
  • 4
  • 45
  • 62
  • Cross validation is to obtain a combination of train and test data right? That I am already doing here. – MaxSteel Apr 26 '13 at 17:11
  • No, to find on average what box constraint and gamma work best. – carlosdc Apr 26 '13 at 17:12
  • Can you explain above a little more. Sorry to ask a trivial question like this. – MaxSteel Apr 26 '13 at 17:13
  • @MaxSteel: in this case, there will be two levels of nested cross-validation, one to get folds of train/test, and for each training subset, perform cross-validation to find best C/gamma values for that subset – Amro Apr 26 '13 at 17:29