2

I converted a XGBoost classifier model to an ONNX model by onnxmltools and quantized the ONNX model using ONNX quantize_dynamic().

But I didn't get a quantized ONNX model with smaller model file size or faster inference time.

I used Anaconda3, Python 3.8.5, ONNX 1.10.0, ONNX Runtime 1.10.0 and ONNXMLTools 1.10.0 to convert and quantize models on Windows 10.

I created a XGBoost classifier model and convert it to an ONNX model by following code:

import onnx
import onnxmltools
from onnxmltools.convert.common.data_types import FloatTensorType
from xgboost import XGBClassifier

clf_xgb = XGBClassifier().fit(X_train, y_train)

onnx_model_path = "xgb_clf.onnx"

initial_type = [('float_input', FloatTensorType([None, num_features]))]

onnx_model = onnxmltools.convert.convert_xgboost(clf_xgb, initial_types=initial_type, target_opset=10)
onnx.save(onnx_model, onnx_model_path)

I quantized the ONNX model by following code:

import onnx
from onnxruntime.quantization import quantize_dynamic, QuantType

model = 'xgb_clf.onnx'
model_quant = 'xgb_clf_quant.onnx'
quantized_model = quantize_dynamic(model, model_quant, weight_type=QuantType.QUInt8)

After quantization, I found the model file size and inference time of the ONNX and quantized ONNX were almost the same.

I uploaded the quantized ONNX model to https://netron.app/ and get following graph:

enter image description here.

(Sorry that I do not have enough reputation to embed the graph)

(Thank you for voting, now I can embed the graph bellow.)

enter image description here

There was only one Op in the ONNX model called TreeEnsembleClassifier which is not a common Op in most of the CNN models, e.g. Conv, MaxPool, Relu.

Is it the reason that I cannot reduce the inference time and the model file size after quantization?

If it is, how can I quantize the ONNX model converted from a XGBoost Classifier Model?

I appreciate any help or guidance.

SC Chen
  • 23
  • 5

0 Answers0