7

This may be more of a Tensorflow gradient question. I have been attempting to implement Intersection over Union (IoU) as losses and have been running into some problems. To the point, here is the snippet of my code that computes the IoU:

def get_iou(masks, predictions):
    ious = []
    for i in range(batch_size):
        mask = masks[i]
        pred = predictions[i]
        masks_sum = tf.reduce_sum(mask)
        predictions_sum = tf.reduce_mean(pred)
        intersection = tf.reduce_sum(tf.multiply(mask, pred))
        union = masks_sum + predictions_sum - intersection
        iou = intersection / union
        ious.append(iou)
    return ious

iou = get_iou(masks, predictions)
mean_iou_loss = -tf.log(tf.reduce_sum(iou))
train_op = tf.train.AdamOptimizer(0.001).minimize(mean_iou_loss)

It works as predicted. However, the issue that I am having is the losses do not decrease. The model does train, though the results are less than ideal so I am wondering if I am implementing it correctly. Do I have to compute the gradients myself? I can compute the gradients for this IoU loss derived by this paper using tf.gradients(), though I am not sure how to incorporate that with the tf.train.AdamOptimizer(). Reading the documentation, I feel like compute_gradients and apply_gradients are the commands that I need to use, but I can't find any examples on how to use them. My understanding is that the Tensorflow graph should be able to come up with the gradient itself via chain rule. So is a custom gradient even necessary in this problem? If the custom gradient is not necessary then I may just have an ill-posed problem and need to adjust some hyperparameters.

Note: I have tried Tensorflow's implementation of the IoU, tf.metrics.mean_iou(), but it spits out inf every time so I have abandoned that.

Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83
MasterYoda
  • 220
  • 3
  • 10

1 Answers1

6

Gradient computation occurs inside optimizer.minimize function, so, no explicit use inside loss function is needed. However, your implementation simply lacks an optimizable, trainable variable.

iou = get_iou(masks, predictions)
mean_iou_loss = tf.Variable(initial_value=-tf.log(tf.reduce_sum(iou)), name='loss', trainable=True)
train_op = tf.train.AdamOptimizer(0.001).minimize(mean_iou_loss)

Numerical stability, differentiability and particular implementation aside, this should be enough to use it as a loss function, which will change with iterations.

Also take a look:

https://arxiv.org/pdf/1902.09630.pdf

Why does one not use IOU for training?

Sharky
  • 4,473
  • 2
  • 19
  • 27
  • Thanks for the answer, though this fix does not work. The code hangs at the defining tensor variable stage `mean_iou_loss = tf.Variable(initial_value=-tf.log(tf.reduce_sum(iou)), name='loss', trainable=True)` – MasterYoda Mar 31 '19 at 18:09
  • Strange, It absolutely should work. Does it throw any error? – Sharky Mar 31 '19 at 18:10
  • try using some small random data, and set logging to info, maybe it's simply oom error – Sharky Mar 31 '19 at 18:40
  • It's not OOM; I have tried batch size of one and checked memory via `nvidia-smi` and nothing changed. It seems to hang at various portions of the tree. I tried initializing it to a constant instead of `-tf.log(tf.reduce_sum(iou))` and it works that way. – MasterYoda Apr 01 '19 at 23:44
  • What kind of input data do you use? Type, dimensions? – Sharky Apr 02 '19 at 07:47
  • So in order to get the predictions for the bounding box, I have to do a ´tf.argmax´ operation, which I don’t think the gradient is defined for. This could be the source of my problem. – MasterYoda Apr 08 '19 at 04:24
  • Yes, it's a widely discussed topic. But in many cases it can be substituted for softmax. And in many cases, none gradients problem arise from integer type data, which is a problem for tensorflow – Sharky Apr 08 '19 at 06:20