0

I was able to successfully quantise a pytorch model for huggingface text classification with intel lpot(neural compressor)

I now have the original fp32 model and quantised int8 models in my machine. For inference I loaded the quantised lpot model with the below code

model = AutoModelForSequenceClassification.from_pretrained('fp32/model/path')
from lpot.utils.pytorch import load  
modellpot = load("path/to/lpotmodel/", model)

I am able to see time time improvement of sorts, But I wanted to confirm if the model weights have been actually quantized and use data types such as int8,fp16 etc, which should be ideally the reason of speed up. I iterate over the model weights and print dtypes of the weights, but I see all weights are of type fp32

for param in modellpot.parameters():
  print(param.data.dtype)

output

torch.float32
torch.float32
torch.float32
torch.float32
torch.float32
torch.float32
torch.float32
..
...

How do I verify if my pytorch model has been quantised?

ArunJose
  • 1,999
  • 1
  • 10
  • 33

1 Answers1

0

Use print(modellpot) to check whether the model is quantized. For example, Linear layer will be converted to QuantizedLinear layer. Actually, only layers that are supported in PyTorch will be converted into Quantized layer, so not all parameters are int8/uint8.

When the model is printed in the output for each you would be able to see the datatype eg the model output would show dtype as qint8 if int8 quantisation has been performed while printing the model.

ArunJose
  • 1,999
  • 1
  • 10
  • 33