I am facing a False Positive Reduction problem, and ratio of the size of positive and negative is approx. 1.7:1. I learned from the answer that using precision, recall, FScore, or even weighting true-positive, false-positive, true-negative and false-negative differently dependent on cost to evaluate different models to deal with specified classification task.
Since Precision, Recall, and FScore are removed from keras, I found some methods to do the tracking of those metrics during training, such as github repo keras-metrics.
Besides, I also find ohter solutions by defining precision like this,
def precision(y_true, y_pred):
"""Precision metric.
Only computes a batch-wise average of precision.
Computes the precision, a metric for multi-label classification of
how many selected items are relevant.
"""
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
However, since those methods is tracking the metrics during training, and all of those claim to be batch-wise average
rather than a global value.
I wonder how neccessary is that to keep track on those metrics during training. Or I just focus on the loss
and accuracy
during training, and evaluate all models using validation functions from like scikit-learn
to compare those metrics with a global method.