18

Is there a way to get per class precision or recall when doing multiclass classification using tensor flow.

For example, If I have y_true and y_pred from each batch, is there a functional way to get precision or recall per class if I have more than 2 classes.

prateek agrawal
  • 453
  • 1
  • 4
  • 13

7 Answers7

7

Here's a solution that is working for me for a problem with n=6 classes. If you have many more classes this solution is probably slow and you should use some sort of mapping instead of a loop.

Assume you have one hot encoded class labels in rows of tensor labels and logits (or posteriors) in tensor labels. Then, if n is the number of classes, try this:

y_true = tf.argmax(labels, 1)
y_pred = tf.argmax(logits, 1)

recall = [0] * n
update_op_rec = [[]] * n

for k in range(n):
    recall[k], update_op_rec[k] = tf.metrics.recall(
        labels=tf.equal(y_true, k),
        predictions=tf.equal(y_pred, k)
    )

Note that inside tf.metrics.recall, the variables labels and predictions are set to boolean vectors like in the 2 variable case, which allows the use of the function.

Avi
  • 454
  • 4
  • 7
6

2 facts:

  1. As stated in other answers, Tensorflow built-in metrics precision and recall don't support multi-class (the doc says will be cast to bool)

  2. There are ways of getting one-versus-all scores by using precision_at_k by specifying the class_id, or by simply casting your labels and predictions to tf.bool in the right way.

Because this is unsatisfying and incomplete, I wrote tf_metrics, a simple package for multi-class metrics that you can find on github. It supports multiple averaging methods like scikit-learn.

Example

import tensorflow as tf
import tf_metrics

y_true = [0, 1, 0, 0, 0, 2, 3, 0, 0, 1]
y_pred = [0, 1, 0, 0, 1, 2, 0, 3, 3, 1]
pos_indices = [1]        # Metrics for class 1 -- or
pos_indices = [1, 2, 3]  # Average metrics, 0 is the 'negative' class
num_classes = 4
average = 'micro'

# Tuple of (value, update_op)
precision = tf_metrics.precision(
    y_true, y_pred, num_classes, pos_indices, average=average)
recall = tf_metrics.recall(
    y_true, y_pred, num_classes, pos_indices, average=average)
f2 = tf_metrics.fbeta(
    y_true, y_pred, num_classes, pos_indices, average=average, beta=2)
f1 = tf_metrics.f1(
    y_true, y_pred, num_classes, pos_indices, average=average)
LeCodeDuGui
  • 221
  • 3
  • 7
3

I believe you cannot do multiclass precision, recall, f1 with the tf.metrics.precision/recall functions. You can use sklearn like this for a 3 class scenario:

from sklearn.metrics import precision_recall_fscore_support as score

prediction = [1,2,3,2] 
y_original = [1,2,3,3]

precision, recall, f1, _ = score(y_original, prediction)

print('precision: {}'.format(precision))
print('recall: {}'.format(recall))
print('fscore: {}'.format(f1))

This will print an array of precision, recall values but format it as you like.

Community
  • 1
  • 1
Gun2sh
  • 870
  • 12
  • 22
3

I have been puzzled by this problem for quite a long time. I know this problem can be solved by sklearn, but I really want to solve this by Tensorflow's API. And by reading its code, I finally figure out how this API works.

tf.metrics.precision_at_k(labels, predictions, k, class_id)
  • Firstly, let's assume this is a 4 classes problem.
  • Secondly, we have two samples which their labels are 3 and 1 and their predictions are [0.5,0.3,0.1,0.1], [0.5,0.3,0.1,0.1] .According to our predictions, we can get the result that the two samples has been predicted as 1,1.
  • Thirdly, if you want to get the precision of class 1, use the formula TP/(TP+FP), and we assume the result is 1/(1+1)=0.5. Because the two samples both have been predicted as 1, but one of the them is actually 3, so the TP is 1, the FP is 1, and the result is 0.5.
  • Finally, let's use this API to verify our assumption.

    import tensorflow as tf
    
    labels = tf.constant([[2],[0]],tf.int64)
    predictions = tf.constant([[0.5,0.3,0.1,0.1],[0.5,0.3,0.1,0.1]])
    
    metric = tf.metrics.precision_at_k(labels, predictions, 1, class_id=0)
    
    sess = tf.Session()
    sess.run(tf.local_variables_initializer())
    
    precision, update = sess.run(metric)
    print(precision) # 0.5
    

NOTICE

  • k isn't the number of classes. It represents the number of what we want to sort, which means the last dimension of predictions must match the value of k.

  • class_id represents the Class for which we want binary metrics.

  • If k=1, means that we won't sort the predictions, because what we want to do is actually a binary classificaion, but referring to different classes. So if we sort the predictions, the class_id will be confused and the result will be wrong.

  • And one more important thing is that if we want to get the right result, the input of label should minus 1 because the class_id actually represents the index of the label, and the subscript of label starts with 0.

Hong Lan
  • 41
  • 3
2

There is a way to do this in TensorFlow.

tf.metrics.precision_at_k(labels, predictions, k, class_id)

set k = 1 and set corresponding class_id. For example class_id=0 to calculate the precision of first class.

Nandeesh
  • 2,683
  • 2
  • 30
  • 42
1

I believe TF does not provide such functionality yet. As per the docs (https://www.tensorflow.org/api_docs/python/tf/metrics/precision), it says both the labels and predictions will be cast to bool, and so it relates only to binary classification. Perhaps it's possible to one-hot encode the examples and it would work? But not sure about this.

AVCarreiro
  • 106
  • 6
  • 1
    Again, these functions do not compute metrics separately for each class, as the question asks. If certain classes appear in the data more frequently than others, these metrics will be dominated by those frequent classes. What is generally desired is to compute a separate recall and precision for each class and then to average them across classes to get overall values (similar to `tf.metrics.mean_per_class_accuracy`). The values will likely be different from what is obtained using `tf.metrics.recall` and `tf.metrics.precision` with imbalanced data. – Avi Jan 09 '18 at 19:02
  • 1
    Actually, I was mistaken; `tf.metrics.mean_per_class_accuracy` does something different and isn't a good reference for this question. – Avi Jan 25 '18 at 21:55
0

Here's a complete example from predicting in Tensorflow to reporting via scikit-learn:

import tensorflow as tf
from sklearn.metrics import classification_report

# given trained model `model` and test vector `X_test` gives `y_test`
# where `y_test` and `y_predicted` are integers, who labels are indexed in 
# `labels`
y_predicted = tf.argmax(model.predict(X_test), axis=1)

# Confusion matrix
cf = tf.math.confusion_matrix(y_test, y_predicted)
plt.matshow(cf, cmap='magma')
plt.colorbar()
plt.xticks(np.arange(len(labels)), labels=labels, rotation=90)
plt.yticks(np.arange(len(labels)), labels=labels)
plt.clim(0, None)

# Report
print(classification_report(y_test, y_predicted, target_names=labels))
Danielle Madeley
  • 2,616
  • 1
  • 19
  • 26