Hello all machine learning experts, I am naive in machine learning topics. My data have six features(6 regular attributes) and 2 labels(1 special attribute)(true and false)(hope I used right term). I want to combine those features which has to be trained by SVM. Data looks like that:-
ZDis ZAnch ZSurf Zval ZDom ZEntropy Top5
0.48659 -0.20412 1.19243 0.15374 0.59667 1.34151 False
-0.10067 4.89898 -0.73677 0.22506 0.59667 1.34151 True
2.24837 -0.20412 -2.02291 0.22455 0.59667 1.34151 False
0.48659 -0.20412 1.19243 -0.06352 0.59667 1.34151 False
-0.68793 -0.20412 1.19243 0.12405 0.59667 1.34151 False
-2.02698 -0.40825 1.86371 0.07348 1.3272 -0.1242 False
-0.1807 2.44949 0.17865 0.07345 0.9401 0.1505 False
1.66557 2.44949 -1.50641 0.07381 0.9401 1.30135 False
1.11169 -0.40825 0.34716 0.07381 0.9401 -0.20225 True
1.5337 -0.40825 -0.01393 0.07381 -0.9954 0.53144 False
-0.01945 -0.48348 -1.16128 0.11035 2.02339 0.90237 False
-1.52944 3.23556 0.23428 0.11093 1.22613 -0.12973 False
0.43354 -0.48348 -2.20795 0.11093 1.22613 2.25734 False
2.84953 -0.48348 -2.20795 0.11093 1.49189 3.07609 True
So I want to do here total = X1*ZDis+X2*ZAnch+X3*ZSurf+X4*Zval+X5*ZDom+X6*ZEntropy where X1..X6 are weighted value which should come from SVM. I used rapidminner to to get this weight value for my 40 examples of training set and result is below:-
Total number of Support Vectors: 40
Bias (offset): -1.055
w[ZDis] = 0.076
w[ZAnch] = -0.058
w[ZSurf] = 0.057
w[Zval] = 0.010
w[ZDom] = 0.073
w[ZEntropy] = 0.077
I am not sure I did the correct approach or not so I need your kind help. Thanks in advance. Also if someone guide me how to write code on this SVM problem in python that will be helpful for me too.
Thanks Pallab
After getting feedback from you , I did some analysis again for my problem , where I have 277 datasets and 8 are positive and 269 are negative with 8 features so its showing me clearly, its imbalance dataset. as I told before, I want to give equal importance to all my features using SVM by SVM weight and then want to do ( w1*x1+w2*x2+...+w8*x8)
and which will help me to extract true result from my dataset. The data is like:-
`NameOfMotif eval_Zscore dis_Zscore abind_Zscore surf_Zscore pfam_Zscore ptm_Zscore coil_Zscore entropy_Zscore TrueVsFalse
ptk_9 0.77428 0.2387 -0.39736 1.48274 0.61237 -0.21822 0.49111 0.44599 False
ptk_8 0.77494 -0.97317 -0.39736 -0.27357 -1.63299 -0.21822 0.6181 -0.04028 False
ptk_3 0.77591 1.45058 -0.39736 -0.1139 0.61237 4.58258 0.74509 -0.85069 True
ptk_6 0.77583 -2.18505 -0.39736 -0.27357 0.61237 -0.21822 -0.3343 -0.92281 False
ptk_22 0.55932 1.45058 -0.39736 0.70216 0.61237 -0.21822 1.25303 -2.17556 False
ptk_23 0.51159 -0.97317 -0.39736 1.05697 -1.63299 -0.21822 1.25303 0.77021 False
ptk_20 0.62907 0.2387 -0.39736 1.05697 0.61237 -0.21822 -0.22848 -1.21702 False
..............................................................................
scf-trcp1_1 0.17425 2.23675 -0.92125 -0.03478 1.20877 5.13288 1.31262 2.27655 True
scf-trcp1_3 0.17425 -1.068 -0.92125 -0.82472 -2.43745 -0.43743 0.48341 -0.59339 False
scf-trcp1_5 0.17425 0.41914 0.24523 -1.05041 0.23644 -0.43743 -0.02919 1.68523 False
scf-trcp1_7 0.17425 -1.63453 -0.92125 -1.25354 -1.82975 -0.43743 -2.0193 0.95051 False`
and my svm out put is
kernel type polynomial
cross fold validation =5
c=100000.0
kernal degree = 1.0E-4
L-pos =2.0
L-neg =2.0
PerformanceVector:
accuracy: 84.60% +/- 23.58% (mikro: 84.48%)
ConfusionMatrix:
True: False True
False: 228 2
True: 41 6
precision: 31.08% +/- 25.51% (mikro: 12.77%) (positive class: True)
ConfusionMatrix:
True: False True
False: 228 2
True: 41 6
recall: 70.00% +/- 40.00% (mikro: 75.00%) (positive class: True)
ConfusionMatrix:
True: False True
False: 228 2
True: 41 6
AUC (optimistic): 0.793 +/- 0.184 (mikro: 0.793) (positive class: True)
AUC: 0.793 +/- 0.184 (mikro: 0.793) (positive class: True)
AUC (pessimistic): 0.793 +/- 0.184 (mikro: 0.793) (positive class: True)
My question is here, my approach is good enough now? all parameter I used to optimize SVM is fine ? I am very much naive in this issue!! thanks Pallab