I want to understand the basic operation done in a convolution layer of a quantized model in TensorflowLite.
As a baseline, I chose a pretrained Tensorflow model, EfficientNet-lite0-int8 and used a sample image to serve as input for model's inference. Thereinafter, I managed to extract the output tensor of the first fused ReLU6 Convolution Layer and compared this output with that of my custom python implementation on this.
The deviation between the two tensors was large and something that I cannot explain is that Tensorflow's output tensor was not between the range of [0,6] as expected (I expected that because of the fused ReLU6 layer in the Conv layer).
Could you please provide me with a more detailed description of a quantized fused Relu6 Conv2D layer's operation in TensorflowLite?