1

I quantize a Keras h5 model (TF 1.13 ; keras_vggface model) with Tensorflow 1.15.3, to use it with an NPU. The code I used for conversion is:

converter = tf.lite.TFLiteConverter.from_keras_model_file(saved_model_dir + modelname)  
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8
tflite_quant_model = converter.convert()

The quantized model I get looks good on first sight. Input type of layers are int8, filter are int8, bias is int32, and output is int8.

However, the model has a quantize layer after the input layer and the input layer is float32 [See image below]. But it seems that the NPU needs also the input to be int8.

Is there a way to fully quantize without a conversion layer but with also int8 as input?

As you see above, I used the :

 converter.inference_input_type = tf.int8
 converter.inference_output_type = tf.int8

EDIT

SOLUTION from user dtlam

Even though the model still does not run with the google NNAPI, the solution to quantize the model with in and output in int8 using either TF 1.15.3 or TF2.2.0 is, thanks to delan:

...
converter = tf.lite.TFLiteConverter.from_keras_model_file(saved_model_dir + modelname) 
        
def representative_dataset_gen():
  for _ in range(10):
    pfad='pathtoimage/000001.jpg'
    img=cv2.imread(pfad)
    img = np.expand_dims(img,0).astype(np.float32) 
    # Get sample input data as a numpy array in a method of your choosing.
    yield [img]
    
converter.representative_dataset = representative_dataset_gen

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.experimental_new_converter = True

converter.target_spec.supported_types = [tf.int8]
converter.inference_input_type = tf.int8 
converter.inference_output_type = tf.int8 
quantized_tflite_model = converter.convert()
if tf.__version__.startswith('1.'):
    open("test153.tflite", "wb").write(quantized_tflite_model)
if tf.__version__.startswith('2.'):
    with open("test220.tflite", 'wb') as f:
        f.write(quantized_tflite_model)

enter image description here

Florida Man
  • 2,021
  • 3
  • 25
  • 43

1 Answers1

2

If you applied Post-training Quantization you have to make sure your representative dataset not in float32. Furthermore, if you want to surely quantize model with int8 or uint8 input/ouput you should consider using quantize aware training. This also give you better result in quatization enter image description here

I also tried to quantize your model from the image and code you give me and it is quantized after all enter image description here

dtlam26
  • 1,410
  • 11
  • 19
  • Ahhhhhhhhhhhhhhh, that is good to know. I will try this ASAP. Using training aware quantizing will be tried as a last resort :-) The beauty of the Model is that it is quite well pretrained. Can you use the training aware quantization also during fine tuning? So just retrain the model on a smaller int8 set using quantization aware training? Thanks a lot and have a save and healthy day. – Florida Man Sep 09 '20 at 08:02
  • 1
    yeah, QAT helps the model operate better in lower bit dimension ;)). AsI see from your model, if you use the QAT from tf2.0 the quantized model can not have the input uint8 or int8 – dtlam26 Sep 09 '20 at 11:51
  • 1
    in this, they have clearly state that you shoud use tf1.x https://coral.ai/docs/edgetpu/faq/ – dtlam26 Sep 09 '20 at 12:13
  • Thanks so much. Good to know that TF2 only supports float32 input. I was not aware of this (I am not using the TPU of google, so I missed this document, cheers). – Florida Man Sep 09 '20 at 12:45
  • TPU is powerful but tricky to play. I have spend a year with this so feel free to ask if you have any problems – dtlam26 Sep 09 '20 at 13:04
  • 1
    Great to hear. Most likely I will come back to that offer ;-) Talk to you soon. Cheers jan – Florida Man Sep 09 '20 at 14:11
  • Hi, I am comming back to your offer.As you suggested to change the input dtype of the data, I ran into an error I could not solve.If your time allows, would you take a loock?cheers https://stackoverflow.com/questions/63830570/full-quatization-does-not-except-int8-data-to-change-model-input-layer-to-int8 – Florida Man Sep 10 '20 at 13:19
  • can you show me the code of your representative dataset? – dtlam26 Sep 10 '20 at 13:31
  • if possible, also give me your keras model – dtlam26 Sep 10 '20 at 13:51
  • Hi, here you will find the extracted code: https://github.com/JanderHungrige/Right-quantization and a sample image. I just load a sample image in the same format as the training happened. I will upload the model, but for git it is to large. However, at the top there is the quick description how to install the keras_vgg. It is a simple pip command. – Florida Man Sep 10 '20 at 14:11
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/221269/discussion-between-florida-man-and-dtlam26). – Florida Man Sep 10 '20 at 14:15
  • <3 <3 <3 <3 <3 <3 – dtlam26 Sep 11 '20 at 02:21