I am trying to classify data about 5000 records with about 1000 truth values into 2 classes using an SVM. My code is from the example as below:
from sklearn import svm
clf = svm.SVC()
clf.fit(X, Y)
so I am using most of the default values. The variance is very high for me. The training accuracy is more than 95% while the test I am doing extracting about 50 records from the data set is 50%.
However if I change the size of the training of test data to about 3000 and 2000 records then the training accuracy drops to 80% and the test accuracy goes up. Why is this happening?
Now if I change the scikit-learn library to logistic regression then the percentages remain unchanged. Why is that so?