Is it possible to convert an ONNX ConvInteger operator to OpenVINO while keeping tensor precision?

Question

I am using the OpenVINO model optimizer framework to convert an ONNX model containing a single ConvInteger operation to OpenVINO IR format.

mo --input_model {onnx_model}

The ONNX ConvInteger operator has input and weight tensors with INT8/UINT8 precision, and an output tensor with INT32 precision - this output precision is the only supported precision.

When the model is converted to OpenVINO, the input and weight tensors are converted to INT32 precision automatically, and convert operators are added to the model to make this change in precision.

Is it possible to force the int8/uint8 precision for the openvino model? Alternatively, is there a simple way to convert the precisions to int8/uint8 once the openvino model has been created?

Thanks

Could you run the model using OV's benchmark_app and add the "-pc" flag? This should tell you in what precisions given nodes were actually executed. It's possible that the converted model is optimized further when loading it to the plugin and at some point the Convert operations are removed and the final inference happens using 8-bit data types. — tomdol, Jun 29 '22 at 08:59

score 0 · Answer 1 · answered Jun 21 '22 at 07:25

You can convert the FP32 or FP16 precision into INT8 without model retraining or fine-tuning by using OpenVINO Post-training Optimization Tool (POT). This tool supports the uniform integer quantization method.

There are two main quantization methods:

Default Quantization: a recommended method that provides fast and accurate results in most cases. It requires only an unannotated dataset for quantization.
Accuracy-aware Quantization: an advanced method that allows keeping accuracy at a predefined range at the cost of performance improvement in case when Default Quantization cannot guarantee it. The method requires annotated representative dataset and may require more time for quantization.

Is it possible to convert an ONNX ConvInteger operator to OpenVINO while keeping tensor precision?

1 Answers1