5

I'm currently using svc to separate two classes of data (the features below are named data and the labels are condition). After fitting the data using the gridSearchCV I get a classification score of about .7 and I'm fairly happy with that number. After that though I went to get the relative distances from the hyper-plane for data from each class using grid.best_estimator_.decision_function() and plot them in a boxplot and a histogram to get a better idea of how much overlap there is. My problem is that in the histogram and the boxplot these look perfectly seperable shich I know is not the case. I'm sure I'm calling decision_function() incorrectly but not sure how to do this really.

    svc=SVC(kernel='linear,probability=True,decision_function_shape='ovr')
cv=KFold(n_splits=4,shuffle=True)
svc=SVC(kernel='linear,probability=True,decision_function_shape='ovr')
C_range=[1,.001,.005,.01,.05,.1,.5,5,50,10,100]
param_grid=dict(C=C_range)
grid=GridSearchCV(svc,param_grid=param_grid, cv=cv,n_jobs=4,iid=False, refit=True)
grid.fit(data,condition)
print grid.best_params
print grid.best_score_

x=grid.best_estimator_.decision_function(data)
plt.hist(x)    
sb.boxplot(condition,x)
sb.swarmplot

In the histogram and box plots it looks like almost all of the points have a distance of either exactly positive or negative one with nothing between them.

0 Answers0