-1

I've been trying to fit some data and predict them.I'm using SVC function in sklearn to train them.My problem is that my data are so complicated and I don't know how to classify them.I'm Uploading a 3d figure here .The dataset includes about 800 rows with 3 columns.I used gamma=100 and C=10.0 and after splitting the data set and test them i got accuracies between 61.0 and 64.0 percent.but i think i can do better than these.i set kernel 'rbf' and after some tests i understood that 'rbf' is good choice.but after reading the documentation of svm here and the kernel functions here i got confused.here are my questions:1.Which kernel should i use(based on my dataset which is uploaded here)?2.what other parameters should i change for classification task? help me to get good accuracy here is my dataset:

from sklearn import svm
from sklearn.model_selection import train_test_split
model=svm.SVC(C=1.0,gamma=100,kernel='rbf')
X_train, X_test, y_train, y_test = train_test_split(X, labels)
model.fit(X_train,y_train)
print(model.predict(X_test))
print('\n\n\n',y_test,'\n\n\n',

( np.array(y_test)==model.predict(X_test)).sum()/(np.array(y_test).shape))

enter image description here

1 Answers1

0

Just note: You actually did not provide any dataset, just the source code.

Using different kernel seems like a good idea. Only from that image it'S really hard to say which kernel will perform better than the others, usually the choice of kernel requires some intuition or domain knowledge, so it's hard to say that offhand.

Since there are only 4 kernels in scikit-learn, I think you should just try all of them and compare them, maybe using crossvalidation, to see which performs the best. Some of the kernels are parametrized, and there you may try multiple kernels, up to degree 10. Using bigger degree than 10 for polynomial kernel might not help anything, but that's just my guess.

You also should try different valus for the C parameter. In most machine learning algorithms, the constants weighting individual losses in multi-task training (which is the case also here), have "multiplicative" impact (for lack of better words), so I advice to use to use following values for C: [1e-3, 1e-2, 1e-1, 1, 10, 100]

Matěj Račinský
  • 1,679
  • 1
  • 16
  • 28
  • thanks for your answers.i uploaded my dataset with its labels in a text file here.I also created an html animation here make visualization better – mahyar sadeghi Mar 15 '19 at 09:41