I have a CNN already working, but now it is necessary to put it in some specific hardware. For that, I've been told to quantize the model, since the hardware can only use integer operations.
I read a good solution here: How to make sure that TFLite Interpreter is only using int8 operations?
And I wrote this code to make it work:
model_file = "models/my_cnn.h5"
# load data
model = tf.keras.models.load_model(model_file, custom_objects={'tf': tf}, compile=False)
# convert
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint16 # or tf.uint8
converter.inference_output_type = tf.uint16 # or tf.uint8
qmodel = converter.convert()
with open('thales.tflite', 'wb') as f:
f.write(qmodel)
interpreter = tf.lite.Interpreter(model_content=qmodel)
interpreter.allocate_tensors()
# predict
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(input_details)
print(output_details)
image = read_image("test.png")
interpreter.set_tensor(input_details[0]['index'], image)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
When we look at the output printed we can see, first the details:
input_details
[{'name': 'input_1', 'index': 87, 'shape': array([ 1, 160, 160, 3], dtype=int32), 'shape_signature': array([ 1, 160, 160, 3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]
output_details
[{'name': 'Identity', 'index': 88, 'shape': array([ 1, 160, 160, 1], dtype=int32), 'shape_signature': array([ 1, 160, 160, 1], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]
And the output of the quantized model is:
...
[[0. ]
[0. ]
[0. ]
...
[0.00390625]
[0.00390625]
[0.00390625]]
[[0. ]
[0. ]
[0. ]
...
[0.00390625]
[0.00390625]
[0.00390625]]]]
So, I have several problems here:
In input/output details we can see that the input/output layers are int32, but I specified in the code uint16
Also in the input/output details we can see that appears several times "float32" as dtype, and I don't understand why.
Finally, the biggest problem is that the output contains float numbers, which should not happen. So it looks like the model is not really converted to integers.
How can I really quantize my CNN and why it is not working this code?