4

TensorFlow Probability layers (e.g. DenseFlipout) have a losses method (or property) which gets the "losses associated with this layer." Can someone explain what these losses are?

After browsing the Flipout paper, I think the losses refer to the Kullback-Leibler divergence between the prior and posterior distributions of the weight and biases. If someone is more knowledgeable about these things than I am then please correct me.

nbro
  • 15,395
  • 32
  • 113
  • 196
smith
  • 177
  • 1
  • 5

1 Answers1

2

Your suspicion is correct, albeit poorly documented. For example, in the piece of code below

import tensorflow_probability as tfp

model = tf.keras.Sequential([
    tfp.layers.DenseFlipout(512, activation=tf.nn.relu),
    tfp.layers.DenseFlipout(10),
])

logits = model(features)
neg_log_likelihood = tf.nn.softmax_cross_entropy_with_logits(
    labels=labels, logits=logits)

kl = sum(model.losses) # Losses are summed

# The negative log-likelihood and the KL term are combined
loss = neg_log_likelihood + kl 

train_op = tf.train.AdamOptimizer().minimize(loss)

provided in the documentation of the DenseFlipout layer, the losses are summed to get the KL term, and the log-likelihood term is computed separately, and combined with the KL term to form the ELBO.

You can see the loss being added here which, following a few indirections, reveals that the {kernel,bias}_divergence_fn is being used, and that in turn defaults to a lambda that calls tfd.kl_divergence(q, p).

nbro
  • 15,395
  • 32
  • 113
  • 196
Chris Suter
  • 1,338
  • 2
  • 9
  • 9
  • Why are the losses summed and not averaged? Intuitively, it seems to me that we should average them and then we should divide this average by the number of examples in the mini-batch before adding the result to the final loss. In the case of Keras, do you know how exactly the KL divergence of each layer is added to the final loss? – nbro Jan 16 '20 at 15:11