TFLite input/output quantization in multiple signatures

Question

TFLite model conversion allows to automatically quantize or dequantize inputs or outputs of the model. You can do this by setting inference_input_type and inference_output_type appropriately like this.

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

However, as per TensorFlow 2.7 TFLite models finally support multiple signatures. These can be automatically retrieved from saved models, Keras models, or from concrete functions. However, this raises the question: how can you set quantization/dequantization for inputs and outputs at signature level? Moreover, how do you do that if a signature has multiple inputs or outputs?

It seems like inference_input_type and inference_output_type are rather limited to whatever single-input (maybe also single-output?) function the model exports via its call method. Any tips on how to handle quantization/dequantization for specific arguments in different signatures, even if manually, would be most welcome.

A little update. I've found that in case of multiple signatures, the signature targeted by `representative_dataset` (and I'd imagine also by `inference_input_type` and `inference_output_type`) is the first one *in alphabetical order* even when you convert from a list of concrete functions. It does depend on the names of your functions. TF developers: this really needs a design improvement. — user3176103, Sep 24 '21 at 19:32

score 0 · Answer 1 · answered Sep 27 '21 at 23:31

TensorFlow Lite converter can quantize the multiple signature-enabled models as well. Input/output overriding is also working for them.

def representative_dataset():
  # Feed data set for the "encode" signature.
  for data in encode_signature_dataset:
    yield (
      "encode", {
        "image": data.image,
        "bias": data.bias,
      }
    )

  # Feed data set for the "decode" signature.
  for data in decode_signature_dataset:
    yield (
      "decode", {
        "image": data.image,
        "hint": data.hint,
      },
    )

See the details at https://www.tensorflow.org/lite/performance/post_training_quantization#full_integer_quantization

So it's either all arguments quantized or none? The issues about only one signature being quantized, and how which one to quantize is determined (apparently by name alphabetical order) still remain. — user3176103, Sep 29 '21 at 04:45

TFLite input/output quantization in multiple signatures

1 Answers1

Linked