Does Automatic MIXED PRECISION (AMP) half the paramters of a model?

Question

Before I know the automatic mixed precision, I manually half the model and data using half() for training with half precision. But the training result is not good at all.

Then I used the automatic mixed precision to train a network, which returns desent results. But when I save the checkpoint, the parameters in the checkpoints are still in fp32. I want to save a checkpoint with fp16. Therefore, I want to ask if and how I can save the checkpoints with fp16. And this also makes me wonder: when performing conv2d with autocast, does the parameters of conv2d also halfed? or is it only the data halfed?

By the way, I want to save the checkpoints to fp16 because I want to use half precision for the inference. — lee Lin, Oct 17 '22 at 04:50

score 0 · Answer 1 · answered Mar 30 '23 at 22:54

It does not apply "half" for all parameters. It analyzes each layer separately and some work with FP16 and others with FP32.

From the documentation here.

torch.cuda.amp provides convenience methods for mixed precision, where some operations use the torch.float32 (float) datatype and other operations use torch.float16 (half). Some ops, like linear layers and convolutions, are much faster in float16 or bfloat16. Other ops, like reductions, often require the dynamic range of float32. Mixed precision tries to match each op to its appropriate datatype, which can reduce your network’s runtime and memory footprint.

About the checkpoints, a copy of the weights is maintained in the FP32 precision to be by the optimizer, as said here.

Does Automatic MIXED PRECISION (AMP) half the paramters of a model?

1 Answers1