I would like to train with the output of intermediate layers to implement the soft nearest neighbour as a regularizer as stated in https://arxiv.org/pdf/1902.01889.pdf.
Therefore, I have try to implement this via a Gradient tape.
with tf.GradientTape() as tape:
# tape.watch(inputs)
predictions = model(inputs, training=True)
softnn_loss = 0
for one_layer_output in intermediate_layers_output:
##TODO: get the output of all intermediate layers ???
softnn_loss += softnn_obj(one_layer_output, labels)
pred_loss = loss_fn(labels, predictions)
total_loss = pred_loss
total_loss += lamb * softnn_loss
if len(model.losses) > 0:
regularization_loss = tf.math.add_n(model.losses)
total_loss = total_loss + regularization_loss
gradients = tape.gradient(total_loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
The solutions from what I have read requires to create a new function for each layer. This would not work as it does not update the gradient accordingly. How can I code this to obtain the output of all the intermediate layer with only one forward pass, such that the gradient will be update correctly to train this particular model?
A way to code obtain the intermediate layers' output with one forward pass