Automated Mixed Precision from NVidia might be a way to go.
From what I've gathered since 1.14
it is (was) supported in the upstream. All you would have to do is wrap your optimizer like this:
opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
You might also need to set specific environment variable
from within your Python script, namely:
os.environ[‘TF_ENABLE_AUTO_MIXED_PRECISION’] = ‘1’
Above should already employ good mixed precision training practices (e.g. loss scaling, keeping float32
where necessary etc.).
Good resource for this solution should be official NVidia's documentation.
Some other resources gathered which also might be useful (though do not seem to indicate you would have to do anything more) here, here or here.
I would advise against manual casting as you might easily lose precision (e.g. in BatchNorm
statistics used during inference) unless you know ins-and-outs of specific layers.
Additionally, you might also check bfloat16
(brain float) type from Google which has exponent
part of float32
(8
bits) and smaller fraction. This allows it to keep greater range of values (e.g. when computing gradients) when compared to float16
which allows one to avoid loss scaling
.
Above (bfloat16
) should be useful mainly in TPUs, AFAIK NVidia GPU's support for it is not too great (someone correct me if I'm wrong). Some information here.