Use Hamming Distance Loss Function with Tensorflow GradientTape: no gradients. Is it not differentiable?

Question

I'm using Tensorflow 2.1 and Python 3, creating my custom training model following the tutorial "Tensorflow - Custom training: walkthrough".

I'm trying to use Hamming Distance on my loss function:

import tensorflow as tf
import tensorflow_addons as tfa

def my_loss_hamming(model, x, y):
  global output
  output = model(x)

  return tfa.metrics.hamming.hamming_loss_fn(y, output, threshold=0.5, mode='multilabel')


def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
      tape.watch(model.trainable_variables)
      loss_value = my_loss_hamming(model, inputs, targets)

  return loss_value, tape.gradient(loss_value, model.trainable_variables)

When I call it:

loss_value, grads = grad(model, feature, label)
optimizer.apply_gradients(zip(grads, model.trainable_variables))

grads variable is a list with 38 None.

And I get the error:

No gradients provided for any variable: ['conv1_1/kernel:0', ...]

Is there any way to use Hamming Distance without "interrupts the gradient chain registered by the gradient tape"?

score 2 · Answer 1 · answered Jun 11 '20 at 05:39

2

Apology if I'm saying something obvious, but the way how backpropagation works as a fitting algorithm for neural networks is through gradients - e.g. for each batch of training data you compute how much the loss function will improve/degrade if you move a particular trainable weight by a very small amount delta.

Hamming loss is by definition not differentiable, so for small movements of trainable weights you will never experience any changes in the loss. I imagine it is only added to be used for final measurements of trained models' performance rather than for training.

If you want to train a neural net through backpropagation you need to use some differentiable loss - such that can help the model to move weights in the right direction. Sometimes people use different techniques to smooth such losses as Hamming less and create approximations - e.g. here it could be something which would penalize less predictions which are closer to the target answer rather then just giving out 1 for everything above threshold and 0 for everything else.

answered Jun 11 '20 at 05:39

Alexander Pivovarov

4,850
1
11
34

Thanks for your answer. This question is related to this one: https://datascience.stackexchange.com/q/75758/31726 Here I explain what I'm trying to do. And also to this other one, https://math.stackexchange.com/q/3713624/193243, where you will know why I'm using Hamming distance. Thanks again. – VansFannel Jun 11 '20 at 06:29
Absolutely nothing wrong with your approach. As I was saying you just need to smooth out your loss function if you want to use it for training. – Alexander Pivovarov Jun 11 '20 at 06:43
Thanks, but I don’t know how to use my approach as a differentiable loss function. – VansFannel Jun 11 '20 at 08:07
And there is a Hamming Distance loss function, hamming_loss_fn, https://www.tensorflow.org/addons/api_docs/python/tfa/metrics/hamming/hamming_loss_fn. Is this also no differentiable? Thanks. – VansFannel Jun 11 '20 at 10:28
It seems from your explanation that your output has shape `(BatchSize, FlattenedPixelsNxM)` and target is same shape with 0/1 values describing area on the image. And with `threshold=0.5` hamming loss is effectively just discretizes your output by replacing each value >0.5 by 1 and each value <0.5 by 0, then computing the ratio of correctly guessed pixels to total number. There are multiple ways to smooth this thing, but most obvious and simple one would be to just use MSE loss between your prediction and the target. – Alexander Pivovarov Jun 11 '20 at 14:49
If you don't like that MSE loss would penalize the model for outputting values >1 for target 1 and values <0 for target 0 you can apply some simple clamping on your model's output before computing the loss, e.g. just clamp all values between 0 and 1) with `tf.clip_by_value` function. – Alexander Pivovarov Jun 11 '20 at 14:52

Use Hamming Distance Loss Function with Tensorflow GradientTape: no gradients. Is it not differentiable?

1 Answers1