0

I have a CNN already working, but now it is necessary to put it in some specific hardware. For that, I've been told to quantize the model, since the hardware can only use integer operations.

I read a good solution here: How to make sure that TFLite Interpreter is only using int8 operations?

And I wrote this code to make it work:

model_file = "models/my_cnn.h5"

# load data
model = tf.keras.models.load_model(model_file, custom_objects={'tf': tf}, compile=False)

# convert
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint16 # or tf.uint8
converter.inference_output_type = tf.uint16  # or tf.uint8
qmodel = converter.convert()
with open('thales.tflite', 'wb') as f:
   f.write(qmodel)

interpreter = tf.lite.Interpreter(model_content=qmodel)
interpreter.allocate_tensors()
# predict
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(input_details)
print(output_details)

image = read_image("test.png")

interpreter.set_tensor(input_details[0]['index'], image)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

When we look at the output printed we can see, first the details:

input_details

[{'name': 'input_1', 'index': 87, 'shape': array([  1, 160, 160,   3], dtype=int32), 'shape_signature': array([  1, 160, 160,   3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

output_details

[{'name': 'Identity', 'index': 88, 'shape': array([  1, 160, 160,   1], dtype=int32), 'shape_signature': array([  1, 160, 160,   1], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

And the output of the quantized model is:

...
[[0.        ]
[0.        ]
[0.        ]
...
[0.00390625]
[0.00390625]
[0.00390625]]

[[0.        ]
[0.        ]
[0.        ]
...
[0.00390625]
[0.00390625]
[0.00390625]]]]

So, I have several problems here:

  1. In input/output details we can see that the input/output layers are int32, but I specified in the code uint16

  2. Also in the input/output details we can see that appears several times "float32" as dtype, and I don't understand why.

  3. Finally, the biggest problem is that the output contains float numbers, which should not happen. So it looks like the model is not really converted to integers.

How can I really quantize my CNN and why it is not working this code?

ablmmcu
  • 161
  • 1
  • 15

1 Answers1

0

The converter.inference_input_type and converter.inference_output_type support only tf.int8 or tf.uint8, not tf.uint16.

sakumoil
  • 602
  • 4
  • 11
  • as you see in the comment, I tried also "tf.uint8",but still does not work. :( – Sergio Ferrer Sánchez Jun 07 '21 at 09:31
  • Can you share your model file? What kind of architecture are you trying to quantize? Does your representative dataset generator work? The quantization scale and zero point in your input and output details are both 0. – sakumoil Jun 07 '21 at 11:00
  • No, i can not share the model. It is NDA protected. The architechture is "some kind os UNET". Yes, the generator works as expected. – Sergio Ferrer Sánchez Jun 07 '21 at 14:00
  • I managed to succesfully quantize a dummy model using your code above. `input_details` ```{'name': 'input_1', 'index': 27, 'shape': array([ 1, 160, 160, 3], dtype=int32), 'shape_signature': array([ -1, 160, 160, 3], dtype=int32), 'dtype': , 'quantization': (0.003921546973288059, 0), 'quantization_parameters': {'scales': array([0.00392155], dtype=float32), 'zero_points': array([0], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}] ``` Try taking a look into your model, if there are some layers that can not be quantized at all. – sakumoil Jun 08 '21 at 06:00
  • Thanks a lot, I'll look at it. – Sergio Ferrer Sánchez Jun 08 '21 at 12:51
  • @sakumoil, does this types are not supported with ```converter```, but ```tf lite``` supports them? – Gleichmut Aug 14 '23 at 14:56