I have a model post-training-quantized in different ways and inspecting it via Netron.
- Model 1 is quantized with a Tensorflow version 1.3;
- Model 2 is quantized with a Tensorflow version 1.15.3 using also in and output quantization.
model 2 uses converter optimization with
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
As TF 1.3 (model 1) dose not have optimization, it is not used. Post raining quantization is not set to TRUE in both cases.
Inspecting the models I find different values for the layer input and filter. I would like to understand what I am seeing.
Can you tell me what I am seeing when the input quantization looks like: Model 1
quantization: 0 ≤ q ≤ 255
or for model 2
quantization: 0 ≤ 0.03501436859369278 * (q - -128) ≤ 8.928664207458496
Also In the model 2, I see quantization in the input. The model 2 has the quantization in the filters. Could you explain why I see that? I thought model 2, using in and output quantization, would also have quantized filters.
Thank you for explaining the differences.
I am also happy with a link to more explanations on how the quantization values for in/output and filters should look like. I could not find anything.
Attached a screenshot for better understanding (left model 1 right model 2):