Dear developers and NN enthusiasts, I have quantized a model (8-bit post-training quantization) and I'm trying to do inference with the resulting model using tflite interprter.
In some cases the interpreter runs properly, and I can do inference on the quantized model as expected, with outputs close enough to the original model. Thus, my setup appears to be correct. However, depending on the concrete quantized model, I frequently stumble across the following RuntimeError.
Traceback (most recent call last):
File ".\quantize_model.py", line 328, in <module>
interpreter.allocate_tensors()
File "---path removed---tf-nightly_py37\lib\site-packages\tensorflow\lite\python\interpreter.py", line 243, in allocate_tensors
return self._interpreter.AllocateTensors()
RuntimeError: tensorflow/lite/kernels/kernel_util.cc:154 scale_diff / output_scale <= 0.02 was not true.Node number 26 (FULLY_CONNECTED) failed to prepare.
Since the error appears to be related to the scale of the bias, I have retrained the original model using a bias_regularizer. However, the error persists.
Do you have any suggestion on how to avoid this error? should I train or design the model in a different way? Is it possible to suppress this error and continue as usual (even if the accuracy is reduced)?
I have used Netron to extracted some details regarding 'node 26' from the quantized tflite model:
*Node properties ->
type: FullyConnected, location:26. *Attributes asymmetric_quantization: false, fused_activation: NONE, keep_num_dims: false, weights_format: DEFAULT.
*Inputs ->
input. name: functional_3/tf_op_layer_Reshape/Reshape;StatefulPartitionedCall/functional_3/tf_op_layer_Reshape/Reshape
type: int8[1,34]
quantization: 0 ≤ 0.007448929361999035 * (q - -128) ≤ 1.8994770050048828
location: 98
weights. name: functional_3/tf_op_layer_MatMul_54/MatMul_54;StatefulPartitionedCall/functional_3/tf_op_layer_MatMul_54/MatMul_54
type: int8[34,34]
quantization: -0.3735211491584778 ≤ 0.002941111335530877 * q ≤ 0.1489555984735489
location: 42
[weights omitted to save space]
bias. name: functional_3/tf_op_layer_AddV2_93/AddV2_3/y;StatefulPartitionedCall/functional_3/tf_op_layer_AddV2_93/AddV2_3/y
type: int32[34]
quantization: 0.0002854724007192999 * q
location: 21
[13,-24,-19,-9,4,59,-18,9,14,-15,13,6,12,5,10,-2,-14,16,11,-1,12,7,-4,16,-8,6,-17,-7,9,-15,7,-29,5,3]
*outputs ->
output. name: functional_3/tf_op_layer_AddV2/AddV2;StatefulPartitionedCall/functional_3/tf_op_layer_AddV2/AddV2;functional_3/tf_op_layer_Reshape_99/Reshape_99/shape;StatefulPartitionedCall/functional_3/tf_op_layer_Reshape_99/Reshape_99/shape;functional_3/tf_op_layer_Reshape_1/Reshape_1;StatefulPartitionedCall/functional_3/tf_op_layer_Reshape_1/Reshape_1;functional_3/tf_op_layer_AddV2_93/AddV2_3/y;StatefulPartitionedCall/functional_3/tf_op_layer_AddV2_93/AddV2_3/y
type: int8[1,34]
quantization: -0.46506571769714355 ≤ 0.0031077787280082703 * (q - 22) ≤ 0.32741788029670715
location: 99