TFlite interpreter raises a RuntimeError when allocating tensors for a quantized model. Failed assertion involving scale_diff and output_scale

Question

Dear developers and NN enthusiasts, I have quantized a model (8-bit post-training quantization) and I'm trying to do inference with the resulting model using tflite interprter.

In some cases the interpreter runs properly, and I can do inference on the quantized model as expected, with outputs close enough to the original model. Thus, my setup appears to be correct. However, depending on the concrete quantized model, I frequently stumble across the following RuntimeError.

Traceback (most recent call last):
    File ".\quantize_model.py", line 328, in <module>
        interpreter.allocate_tensors()
    File "---path removed---tf-nightly_py37\lib\site-packages\tensorflow\lite\python\interpreter.py", line 243, in allocate_tensors
        return self._interpreter.AllocateTensors()
RuntimeError: tensorflow/lite/kernels/kernel_util.cc:154 scale_diff / output_scale <= 0.02 was not true.Node number 26 (FULLY_CONNECTED) failed to prepare.

Since the error appears to be related to the scale of the bias, I have retrained the original model using a bias_regularizer. However, the error persists.

Do you have any suggestion on how to avoid this error? should I train or design the model in a different way? Is it possible to suppress this error and continue as usual (even if the accuracy is reduced)?

I have used Netron to extracted some details regarding 'node 26' from the quantized tflite model:

*Node properties ->
type: FullyConnected, location:26. *Attributes asymmetric_quantization: false, fused_activation: NONE, keep_num_dims: false, weights_format: DEFAULT. 
*Inputs ->
input. name: functional_3/tf_op_layer_Reshape/Reshape;StatefulPartitionedCall/functional_3/tf_op_layer_Reshape/Reshape
type: int8[1,34]
quantization: 0 ≤ 0.007448929361999035 * (q - -128) ≤ 1.8994770050048828
location: 98
weights. name: functional_3/tf_op_layer_MatMul_54/MatMul_54;StatefulPartitionedCall/functional_3/tf_op_layer_MatMul_54/MatMul_54
type: int8[34,34]
quantization: -0.3735211491584778 ≤ 0.002941111335530877 * q ≤ 0.1489555984735489
location: 42
[weights omitted to save space]
bias. name: functional_3/tf_op_layer_AddV2_93/AddV2_3/y;StatefulPartitionedCall/functional_3/tf_op_layer_AddV2_93/AddV2_3/y
type: int32[34]
quantization: 0.0002854724007192999 * q
location: 21
[13,-24,-19,-9,4,59,-18,9,14,-15,13,6,12,5,10,-2,-14,16,11,-1,12,7,-4,16,-8,6,-17,-7,9,-15,7,-29,5,3]
*outputs ->
output. name: functional_3/tf_op_layer_AddV2/AddV2;StatefulPartitionedCall/functional_3/tf_op_layer_AddV2/AddV2;functional_3/tf_op_layer_Reshape_99/Reshape_99/shape;StatefulPartitionedCall/functional_3/tf_op_layer_Reshape_99/Reshape_99/shape;functional_3/tf_op_layer_Reshape_1/Reshape_1;StatefulPartitionedCall/functional_3/tf_op_layer_Reshape_1/Reshape_1;functional_3/tf_op_layer_AddV2_93/AddV2_3/y;StatefulPartitionedCall/functional_3/tf_op_layer_AddV2_93/AddV2_3/y
type: int8[1,34]
quantization: -0.46506571769714355 ≤ 0.0031077787280082703 * (q - 22) ≤ 0.32741788029670715
location: 99

score 1 · Answer 1 · answered Jul 22 '20 at 07:46

I have found a workaround, which involves manually modifing the quantized tflite model. This is the file that triggers the RuntimeError in question (tensorflow/lite/kernels/kernel_utils.cc):

// TODO(ahentz): The following conditions must be guaranteed by the training pipeline.
...
const double scale_diff = std::abs(input_product_scale - bias_scale);
const double output_scale = static_cast<double>(output->params.scale);
TF_LITE_ENSURE(context, scale_diff / output_scale <= 0.02);

The comment makes clear that some functionality in model quantization still needs to be completed. The failing condition is related to the scale of the bias. I verified that my quantized model does not fulfill the constraint above. In order to manually fix the quantized model, these steps can be done:

Open the quantized model using Netron and find the node causing the trouble (in my case it is node 26)
Extract the scale of the bias, input, and weights for this node.
Since the bias is represented using dynamic range quantization, the representation is not unique. One can create another representation by scaling the bias scale and bias values (bias zero point is zero and it does not need to be changed). Find a factor k, such that abs(input_scale * weight_scale - bias_scale * k) < 0.02.
Use an hex editor (e.g., ghex in Ubuntu) to edit the tflite model. Replace the incorrect bias_scale with bias_scale * k. You also need to replace the bias values by bias_values / k. Bias_scale is encoded in 32-bit ieee 754 format, little endian (ieee-754 tool), whereas the bias_values are encoded in int32 format little endian.
Save the edited tflite model, it should now pass the required condition, can be used with tflite interpreter, and the model is equivalent to the model before the fix.

Of course, this solution is only a temporary workaround useful until the code in tensorflow's quantizer is corrected.

I am also seeing this issue, is there an github issue tracking this? — mtngld, Nov 04 '20 at 00:31

score 0 · Answer 2 · answered Dec 08 '20 at 04:48

I have another approach that overcomes my problem and share with you guys. According to quantization file The quantization for activation only support with Relu and Identity. It may fail if we miss the biasAdd before Relu activation, therefore, we can wrap the layer as an identity to bypass this by tf.identity. I have tried and it works for my case without editing anything in the cpp files.

TFlite interpreter raises a RuntimeError when allocating tensors for a quantized model. Failed assertion involving scale_diff and output_scale

2 Answers2