1

Currently train keras on tensorflow model with default setting - float32.

Post training the network is quantized: cast weights to float16. This improves performance by ~x3 while keeping the same accuracy.

I was trying to train from start using float16 and failed miserably. I cannot find any link that explain if that is possible and if not why is it not possible.

YoavEtzioni
  • 85
  • 10
  • 1
    Is `tensorflow1.x` a must? It's getting deprecated kinda fast (e.g. tutorials and docs moved to [github](https://github.com/tensorflow/docs/tree/master/site/en/r1) and are in maintenance mode). If it's not, there is an experimental `keras` policy in `tf2.0` for mixed precision training [explained here](https://www.tensorflow.org/guide/keras/mixed_precision). – Szymon Maszke Feb 11 '20 at 15:18
  • 1
    Most of the tf 2.0 is experimentally. I have tried the same and failed, probably for reason related to other elements in my network. – YoavEtzioni Feb 11 '20 at 15:39

1 Answers1

2

Automated Mixed Precision from NVidia might be a way to go.

From what I've gathered since 1.14 it is (was) supported in the upstream. All you would have to do is wrap your optimizer like this:

opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)

You might also need to set specific environment variable from within your Python script, namely:

os.environ[‘TF_ENABLE_AUTO_MIXED_PRECISION’] = ‘1’

Above should already employ good mixed precision training practices (e.g. loss scaling, keeping float32 where necessary etc.).

Good resource for this solution should be official NVidia's documentation.

Some other resources gathered which also might be useful (though do not seem to indicate you would have to do anything more) here, here or here.

I would advise against manual casting as you might easily lose precision (e.g. in BatchNorm statistics used during inference) unless you know ins-and-outs of specific layers.

Additionally, you might also check bfloat16 (brain float) type from Google which has exponent part of float32 (8 bits) and smaller fraction. This allows it to keep greater range of values (e.g. when computing gradients) when compared to float16 which allows one to avoid loss scaling.

Above (bfloat16) should be useful mainly in TPUs, AFAIK NVidia GPU's support for it is not too great (someone correct me if I'm wrong). Some information here.

Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83
  • If it was useful mainly in Google's TPUs then why would NVIDIA make the effort to support it? It can be used with any NVIDIA GPU that supports tensor-cores. V100 and T4 are two of the cards with tensor core support. – William D. Irons Feb 11 '20 at 16:30
  • @WilliamD.Irons I meant `bfloat16` format specifically. It's not TPU specific but TPUs is Google's hardware and so is `bfloat16` idea, hence that's where it is supported "out of the box". You can find discussion regarding ideas of `bfloat16` on Intel's CPU and GPU for PyTorch for example, [here](https://github.com/pytorch/pytorch/issues/8021), [here](https://github.com/pytorch/pytorch/issues/23149) , [here](https://github.com/pytorch/pytorch/issues/23509). I'm not sure about current state of it's implementation though. – Szymon Maszke Feb 11 '20 at 16:47
  • @WilliamD.Irons When it comes to Tensor Cores those technologies target `float16` and `int8`, not `bfloat16` AFAIK. – Szymon Maszke Feb 11 '20 at 16:49
  • 1
    true, bfloat16 is just for TPUs, but the original question was just float16 :). Overall good answer. – William D. Irons Feb 11 '20 at 16:53