3

I am trying to quantize an ONNX model using the onnxruntime quantization tool.

My code is below for quantization:

import onnx
from quantize import quantize, QuantizationMode

# Load the onnx model     
model = onnx.load('3ddfa_optimized_withoutflatten.onnx')

# Quantize
quantized_model = quantize(model, quantization_mode=QuantizationMode.IntegerOps)
 
# Save the quantized model
onnx.save(quantized_model, 'quantized_model.onnx')

After this method the model I am getting has 0 dimensional model. What arguments do I have to pass in quantize function so I can get a proper model?

Joseph Konan
  • 666
  • 2
  • 8
  • 18
Parag Jain
  • 612
  • 2
  • 14
  • 31

3 Answers3

3

Try this:

def quantize_onnx_model(onnx_model_path, quantized_model_path):
    from onnxruntime.quantization import quantize_dynamic, QuantType
    import onnx
    onnx_opt_model = onnx.load(onnx_model_path)
    quantize_dynamic(onnx_model_path,
                     quantized_model_path,
                     weight_type=QuantType.QInt8)

    print(f"quantized model saved to:{quantized_model_path}")

quantize_onnx_model("/content/drive/MyDrive/Datasets/model.onnx", "/content/drive/MyDrive/Datasets/model_quant.onnx")
Shrinidhi M
  • 131
  • 1
  • 2
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 13 '21 at 08:34
2

Unless you share the onnx model, it is hard to tell the cause.

For OnnxRuntime 1.4.0, you can try the following:

    quantized_model = quantize(onnx_opt_model,
                               quantization_mode=QuantizationMode.IntegerOps,
                               symmetric_weight=True,
                               force_fusions=True)

If the problem still exits, please share your onnx model so that we can take a look.

Tianlei Wu
  • 21
  • 2
  • Hey after trying this, I am getting a quantized model with reduced size, but I was expecting to get improved inference timing, but there is no inference time improvement I got, so can you suggest me something on that aspect? – Debjyoti Banerjee Jun 29 '22 at 08:09
0

The following code works for me.

import onnx
from onnxruntime.quantization import quantize_dynamic, QuantType

model_fp32 = 'yolov8.onnx'
model_quant = 'yolov8.quant.onnx'
quantized_model = quantize_dynamic(model_fp32, model_quant)
Aiv Aki
  • 47
  • 5