Quantization of Onnx model

Question

I am trying to quantize an ONNX model using the onnxruntime quantization tool.

My code is below for quantization:

import onnx
from quantize import quantize, QuantizationMode

# Load the onnx model     
model = onnx.load('3ddfa_optimized_withoutflatten.onnx')

# Quantize
quantized_model = quantize(model, quantization_mode=QuantizationMode.IntegerOps)
 
# Save the quantized model
onnx.save(quantized_model, 'quantized_model.onnx')

After this method the model I am getting has 0 dimensional model. What arguments do I have to pass in quantize function so I can get a proper model?

score 3 · Answer 1 · answered Dec 13 '21 at 07:41

3

Try this:

def quantize_onnx_model(onnx_model_path, quantized_model_path):
    from onnxruntime.quantization import quantize_dynamic, QuantType
    import onnx
    onnx_opt_model = onnx.load(onnx_model_path)
    quantize_dynamic(onnx_model_path,
                     quantized_model_path,
                     weight_type=QuantType.QInt8)

    print(f"quantized model saved to:{quantized_model_path}")

quantize_onnx_model("/content/drive/MyDrive/Datasets/model.onnx", "/content/drive/MyDrive/Datasets/model_quant.onnx")

answered Dec 13 '21 at 07:41

Shrinidhi M

131
1
2

1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 13 '21 at 08:34

score 2 · Answer 2 · answered Sep 21 '20 at 05:46

2

Unless you share the onnx model, it is hard to tell the cause.

For OnnxRuntime 1.4.0, you can try the following:

    quantized_model = quantize(onnx_opt_model,
                               quantization_mode=QuantizationMode.IntegerOps,
                               symmetric_weight=True,
                               force_fusions=True)

If the problem still exits, please share your onnx model so that we can take a look.

answered Sep 21 '20 at 05:46

Tianlei Wu

21
2

Hey after trying this, I am getting a quantized model with reduced size, but I was expecting to get improved inference timing, but there is no inference time improvement I got, so can you suggest me something on that aspect? – Debjyoti Banerjee Jun 29 '22 at 08:09

score 0 · Answer 3 · answered Aug 31 '23 at 16:47

0

The following code works for me.

import onnx
from onnxruntime.quantization import quantize_dynamic, QuantType

model_fp32 = 'yolov8.onnx'
model_quant = 'yolov8.quant.onnx'
quantized_model = quantize_dynamic(model_fp32, model_quant)

answered Aug 31 '23 at 16:47

Aiv Aki

47
5

Quantization of Onnx model

3 Answers3