1

I would like to create a custom scorer in SciKit-Learn that I can pass to GridSearchCV, which evaluates model performance based upon the accuracy of predictions for a particular class.

Suppose that my training data consists of data-points belonging to one of three classes:

'dog', 'cat', 'mouse'

# Create a classifier:
clf = ensemble.RandomForestClassifier()

# Set up some parameters to explore:
param_dist =    {
                 'n_estimators':[500, 1000, 2000, 4000],
                 "criterion": ["gini", "entropy"],
                 'bootstrap':[True, False]
                }

# Construct grid search
search = GridSearchCV(clf,\
                      param_grid=param_dist,\
                      cv=StratifiedKFold(y, n_folds=10),\
                      scoring=my_scoring_function)


# Perform search
X = training_data
y = ground_truths
search.fit(X, y)

Is there a way to construct my_scoring_function, such that only the accuracy of predictions for the 'dog' class is returned? The make_scorer function seems to be limited in that it only deals with the ground truth and the predicted class for each data-point.

Many thanks in advance of your help!

Dman2
  • 700
  • 4
  • 10
  • What's the problem with using `make_scorer`? It looks like you just need to equate all non-dogs classes and calculate accuracy. – Artem Sobolev Feb 04 '15 at 17:56

1 Answers1

1

I missed a section in the sklearn documentation.

You can create a function that requires the following inputs; model, x_test, y_test, and outputs a value between 0 and 1 (where 1 is best), that can be used as the optimisation function.

Simply create the function, apply model.predict(x_test), and then analyse the results using a metric such as accuracy.

Dman2
  • 700
  • 4
  • 10