I was able to successfully quantise a pytorch model for huggingface text classification with intel lpot(neural compressor)
I now have the original fp32 model and quantised int8 models in my machine. For inference I loaded the quantised lpot model with the below code
model = AutoModelForSequenceClassification.from_pretrained('fp32/model/path')
from lpot.utils.pytorch import load
modellpot = load("path/to/lpotmodel/", model)
I am able to see time time improvement of sorts, But I wanted to confirm if the model weights have been actually quantized and use data types such as int8,fp16 etc, which should be ideally the reason of speed up. I iterate over the model weights and print dtypes of the weights, but I see all weights are of type fp32
for param in modellpot.parameters():
print(param.data.dtype)
output
torch.float32
torch.float32
torch.float32
torch.float32
torch.float32
torch.float32
torch.float32
..
...
How do I verify if my pytorch model has been quantised?