I converted a XGBoost classifier model to an ONNX model by onnxmltools and quantized the ONNX model using ONNX quantize_dynamic().
But I didn't get a quantized ONNX model with smaller model file size or faster inference time.
I used Anaconda3, Python 3.8.5, ONNX 1.10.0, ONNX Runtime 1.10.0 and ONNXMLTools 1.10.0 to convert and quantize models on Windows 10.
I created a XGBoost classifier model and convert it to an ONNX model by following code:
import onnx
import onnxmltools
from onnxmltools.convert.common.data_types import FloatTensorType
from xgboost import XGBClassifier
clf_xgb = XGBClassifier().fit(X_train, y_train)
onnx_model_path = "xgb_clf.onnx"
initial_type = [('float_input', FloatTensorType([None, num_features]))]
onnx_model = onnxmltools.convert.convert_xgboost(clf_xgb, initial_types=initial_type, target_opset=10)
onnx.save(onnx_model, onnx_model_path)
I quantized the ONNX model by following code:
import onnx
from onnxruntime.quantization import quantize_dynamic, QuantType
model = 'xgb_clf.onnx'
model_quant = 'xgb_clf_quant.onnx'
quantized_model = quantize_dynamic(model, model_quant, weight_type=QuantType.QUInt8)
After quantization, I found the model file size and inference time of the ONNX and quantized ONNX were almost the same.
I uploaded the quantized ONNX model to https://netron.app/ and get following graph:
(Sorry that I do not have enough reputation to embed the graph)
(Thank you for voting, now I can embed the graph bellow.)
There was only one Op in the ONNX model called TreeEnsembleClassifier which is not a common Op in most of the CNN models, e.g. Conv, MaxPool, Relu.
Is it the reason that I cannot reduce the inference time and the model file size after quantization?
If it is, how can I quantize the ONNX model converted from a XGBoost Classifier Model?
I appreciate any help or guidance.