0

I am working on a supervised machine learning algorithm and it seems to have a curious behavior. So, let me start:

I have a function where I pass different classifiers, their parameters, training data and their labels:

def HT(targets,train_new, algorithm, parameters):
#creating my scorer
scorer=make_scorer(f1_score)
#creating the grid search object with the parameters of the function
grid_search = GridSearchCV(algorithm, 
param_grid=parameters,scoring=scorer,   cv=5)
# fit the grid_search object to the data
grid_search.fit(train_new, targets.ravel())
# print the name of the classifier, the best score and best parameters
print algorithm.__class__.__name__
print('Best score: {}'.format(grid_search.best_score_))
print('Best parameters: {}'.format(grid_search.best_params_))
# assign the best estimator to the pipeline variable
pipeline=grid_search.best_estimator_
# predict the results for the training set
results=pipeline.predict(train_new).astype(int)
print results    
return pipeline

To this function I pass parameters like:

clf_param.append( {'C' : np.array([0.001,0.01,0.1,1,10]), 
'kernel':(['linear','rbf']),
'decision_function_shape' : (['ovr'])})

Ok, so here is where things start to get strange. This functions is returning a f1_score but it is different from the score I am computing manually using the formula: F1 = 2 * (precision * recall) / (precision + recall)

There are pretty big differences (0.68 compared with 0.89)

I am doing something wrong in the function ? The score computed by grid_search (grid_search.best_score_) should be the same with the score on the whole training set (grid_search.best_estimator_.predict(train_new)) ? Thanks

Vlad
  • 5
  • 3
  • Please specify how you are manually calculating the score. Is this a binary or multilabel classification? – Vivek Kumar Apr 13 '17 at 16:35
  • Also change the question title to something more appropriate which is related to the difference in scores. Current title is of very little concern to your actual problem – Vivek Kumar Apr 13 '17 at 16:41

1 Answers1

2

The score that you are manually calculating takes into account the global true positives and negatives for all classes. But in scikit, f1_score, the default approach is to calculate the binary average (i.e only for the positive class).

So, in order to achieve the same scores, use the f1_score as specified below:

scorer=make_scorer(f1_score, average='micro')

Or simply, in the gridSearchCV, use:

scoring = 'f1_micro'

More information about how the averaging of scores is done is given on: - http://scikit-learn.org/stable/modules/model_evaluation.html#common-cases-predefined-values

You may also want to take a look at the following answer which describes the calculation of scores in scikit in detail:-

EDIT: Changed macro to micro. As written in documentation:

'micro': Calculate metrics globally by counting the total true positives, false negatives and false positives.

Community
  • 1
  • 1
Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132
  • Thanks for the answer Vivek.. My problem is a binary classification one. I know the training data and the labels and I am applying the formula. Also, after the grid_search performed, in order to make prediction do I need to fit the model again using the best parameters of the grid search to the whole training set ? I assume that the grid search doing cross validation returns the classifier fit to only a part of the training set. – Vlad Apr 13 '17 at 18:41
  • 1
    @Vlad No. The GridSearchCV estimator will refit the whole training data with the best parameters. You can look at the documentation. Actually there is a parameter "refit" in its constructor. By default it's true. So it will refit the whole data supplied to it with the best parameters. – Vivek Kumar Apr 13 '17 at 18:47
  • Thanks Vivek. Great help – Vlad Apr 13 '17 at 19:58