I am trying to use mixed-precision training with tf-slim in order to speed up the training of networks and make use of the tensorcores available on my GPUs. I also want to make use of multiple network architectures with pre-trained checkpoints.
An example of what Mixed-Precision training is and how it works can be found at https://devblogs.nvidia.com/mixed-precision-resnet-50-tensor-cores/
The basic idea is to 1. Cast the inputs and to fp16 for the forward and backward pass 2. Cast the values back to fp32 when adjusting loss and weights 3. When using the Loss for the backward pass, multiply it by a loss scale 4. When updating the weights, divide it by the same loss scale
This reduces the memory bandwidth and makes use of the Tensor Cores on Volta and Turing GPUs through the use of fp16.
My problem is that I can't figure out where to put the casts to fp16 and fp32 with tf-slim.
To start the training, I use the train_image_classifier.py script from models.research.slim
Do I need to do the cast within the definition files for the network architectures? Or do I need to apply the changes within the tf.contrib.slim files?