TFLite model conversion allows to automatically quantize or dequantize inputs or outputs of the model. You can do this by setting inference_input_type
and inference_output_type
appropriately like this.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
However, as per TensorFlow 2.7 TFLite models finally support multiple signatures. These can be automatically retrieved from saved models, Keras models, or from concrete functions. However, this raises the question: how can you set quantization/dequantization for inputs and outputs at signature level? Moreover, how do you do that if a signature has multiple inputs or outputs?
It seems like inference_input_type
and inference_output_type
are rather limited to whatever single-input (maybe also single-output?) function the model exports via its call method. Any tips on how to handle quantization/dequantization for specific arguments in different signatures, even if manually, would be most welcome.