SVM: why only count the first two column features

Question

I found a SVM example online. I do not understand why only count the first two columns of features. The data set is famous "spiral_Nc10_train.mat" and "spiral_Nc10_train.mat". "spiral_Nc10_train.mat" contains 1) data = 1000*3 double; 2) label = 1000*1 double. "spiral_Nc10_test.mat" contains 1) data = 500*3 double; 2) label = 500*1 double. The part of the original code looks like this:

load(fullfile(dirData,'spiral_Nc10_train'));
rawTrainData = data(:,1:2);   (line 2)
rawTrainLabel = label;
NTrain = size(rawTrainData,1);
[sortedTrainLabel, permIndex] = sortrows(rawTrainLabel);
sortedTrainData = rawTrainData(permIndex,:);

load(fullfile(dirData,'spiral_Nc10_test'));
rawTestData = data(:,1:2);    (line 8)
rawTestLabel = label;
NTest = size(rawTestData,1);
[sortedTestLabel, permIndex] = sortrows(rawTestLabel);
sortedTestData = rawTestData(permIndex,:);

I try to change line 2 and line 8 to the following:

rawTrainData = data(:,1:3);
rawTestData = data(:,1:3);

But the result is wrong and the final predict label is also wrong. Can anyone tell me why SVM can only apply to 2 columns features? Thank you so much!

It's not very famous because I've never heard of that dataset before. After a simple Google search, I was brought to this page: https://sites.google.com/site/kittipat/libsvm_matlab/complete_libsvm_example - It seems that's where you got the code from. The website explains that they arbitrarily chose to do it in 2D even though the set is in 3D. My guess is that they do it because it's easier to visualize. — rayryeng, Mar 01 '15 at 19:30
@rayryeng, If I use 3D , the final figure for the predicted label is not the same. You can try to run this by simply changing it to 3D. If you need the dataset, I can email it to you. Thank you! — Angelababy, Mar 01 '15 at 20:28
That makes sense because you're adding an extra dimension to the data set - Look up the curse of dimensionality: http://en.wikipedia.org/wiki/Curse_of_dimensionality - The extra dimension probably would make the classification worse because you may be overfitting. — rayryeng, Mar 01 '15 at 20:31
@rayryeng, if the dimensionality of the dataset is 10, how many columns I should use to do the classification and get the proper classification accuracy, and also do not cause overfitting? — Angelababy, Mar 01 '15 at 20:47
That is unfortunately trial and error. I don't have an answer for you there. Sorry! — rayryeng, Mar 01 '15 at 21:13
@rayreng, I don't know, but 3 dimensions for 1000 training data points doesn't seem very "cursed" to me. :-) Normally that term is applied if there are more dimensions than training data points. — A. Donda, Mar 02 '15 at 13:54
@Angelababy, I would like to have a look, but there is so much custom code involved that has to be scraped in little bits and pieces from that website that it is really really tedious. However, I can assure you that SVMs can work with many more than 2 features, and I don't believe those spiral segments are harder to distinguish in 3d. I can only guess that you have a bug in your label-checking or plotting code; that the code needs to be adapted to more than 3 dimensions in more than those two lines. — A. Donda, Mar 02 '15 at 14:11
@A.Donda Thank you! For spiral dataset, if I put 3D in, the figure for the final predicted label does not make sense. I will check it as you told me. — Angelababy, Mar 02 '15 at 16:44

SVM: why only count the first two column features

0 Answers0