Any reason why these instance could be misclassified?

Question

I started off with two files training & testing.

Then using libsvm I scaled both those files to training.scale and testing.scale

Then using grid.py (part of libsvm) I ran training.scale and and recieved some cross validation values:

C = 512
gamme = 0.03125
validation 5 = 66.8421

Then running svm-train using the variable found from grid.py and training.scale I got a new fine called training.scale.model

I then ran svm-predict and I new file called testing.predict and got a validation % of 60.8333%

Finally comparing testing and testing.predict found that there were 47/120 misclassifications

[https://drive.google.com/folderview?id=0BxzgP5V6RPQHekRjZXdFYW9GX0U&usp=sharing][1]

[1]: link to code

The real question is there any reason why these misclassification occur?

PS. I apologise for the bad format of this question, been up for too long

score 0 · Answer 1 · answered Jan 22 '15 at 12:42

I am guessing you are new to machine learning. The results you've got are perfectly right.

Reason why these mis-classifications occur? The features you've used don't separate your classes well. A 66% cross-validation score should have given you the hint. Even by plain hit or miss method you'll get 50% accuracy, and the feature-set you used could only improve this by another 16%. Try exploring new features.

I'm assuming your data set is clean.

Any reason why these instance could be misclassified?

1 Answers1