Understanding the quantization value

Question

I have a model post-training-quantized in different ways and inspecting it via Netron.

Model 1 is quantized with a Tensorflow version 1.3;
Model 2 is quantized with a Tensorflow version 1.15.3 using also in and output quantization.

model 2 uses converter optimization with

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8

As TF 1.3 (model 1) dose not have optimization, it is not used. Post raining quantization is not set to TRUE in both cases.

Inspecting the models I find different values for the layer input and filter. I would like to understand what I am seeing.

Can you tell me what I am seeing when the input quantization looks like: Model 1

quantization: 0 ≤ q ≤ 255

or for model 2

quantization: 0 ≤ 0.03501436859369278 * (q - -128) ≤ 8.928664207458496

Also In the model 2, I see quantization in the input. The model 2 has the quantization in the filters. Could you explain why I see that? I thought model 2, using in and output quantization, would also have quantized filters.

Thank you for explaining the differences.

I am also happy with a link to more explanations on how the quantization values for in/output and filters should look like. I could not find anything.

Attached a screenshot for better understanding (left model 1 right model 2):

Understanding the quantization value

0 Answers0