I've been studying quantization using Tensorflow's TFLite. As far as I understand it is possible to quantize my model weights (so that they will be stored using 4x less memory) but it doesn't necessary implies that the model won't convert it back to floats to run it. I've also understood that to run my model only using int I need to set the following parameters:
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
I'd like to know what are the diference in the tf.lite.Interpreter
between a loaded model in which those parameters were set and one in which they weren't. I tried to investigate .get_tensor_details()
for that but I didn't notice any difference.