Is it possible to configure TFLite to return a model with bias quantized to int8?

Question

I'm working with Keras/Tensorflow to develop an ANN that will be deployed to a low-end MCU. For this purpose, I have quantized the original ANN using the post-training quantization mechanism offered by Tensorflow Lite. If the weights are indeed quantized to int8, biases were converted from float to int32. Considering that I pretend to implement this ANN in CMSIS-NN, this is a problem as they only support int8 and int16 data.

Is it possible to configure TF Lite to also quantize biases to int8? Below follows the code I am executing:

def quantizeToInt8(representativeDataset):
    # Cast the dataset to float32
    data = tf.cast(representativeDataset, tf.float32)
    data = tf.data.Dataset.from_tensor_slices((data)).batch(1)

    # Generator function that returns one data point per iteration
    def representativeDatasetGen():
        for inputValue in data:
            yield[inputValue]
    
    # ANN quantization
    model = tf.keras.models.load_model("C:/Users/miguel/Documents/Universidade/PhD/Code_Samples/TensorFlow/originalModel.h5")

    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.representative_dataset = representativeDatasetGen
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.target_spec.supported_types = [tf.int8]
    converter.inference_type = tf.int8
    converter.inference_input_type = tf.int8  # or tf.uint8
    converter.inference_output_type = tf.int8  # or tf.uint8
    tflite_quant_model = converter.convert()

    return tflite_quant_model

It's not possible to configure TFLite to do that. Biases are intentionally int32 otherwise the quantization accuracy would not be good. In order to make this work, you'd have to add a new op or custom op and then come up with a custom quantization tooling all together. Do you have any other workaround, eg: another MCU? — Meghna Natraj, Aug 10 '20 at 17:31

score 0 · Accepted Answer · answered Oct 22 '20 at 17:38

From Comments

It's not possible to configure TFLite to do that. Biases are intentionally int32 otherwise the quantization accuracy would not be good. In order to make this work, you'd have to add a new op or custom op and then come up with a custom quantization tooling all together.(paraphrased from Meghna Natraj).

Is it possible to configure TFLite to return a model with bias quantized to int8?

1 Answers1