1

I am trying to find a solution to run face recognition on AI camera. And found that MobileFacenet (code from sirius-ai) is great as a light model!

I succeed to convert to TFLITE with F32 format with good accuracy. However when I failed when quantized to uint8 with the following command:

tflite_convert --output_file tf-lite/MobileFacenet_uint8_128.tflite 
--graph_def_file tf-lite/MobileFacenet.pb 
--input_arrays "input" 
--input_shapes "1,112,112,3" 
--output_arrays output 
--output_format TFLITE 
--mean_values 128 
--std_dev_values 127.5 
--default_ranges_min 0 
--default_ranges_max 255 
--inference_type QUANTIZED_UINT8 
--inference_input_type QUANTIZED_UINT8

This thread help to convert to TFLITE, but not talking about the quantization. Could anyone provide some suggestions? Many thanks!!

S.H
  • 69
  • 9

1 Answers1

2

Using tflite_convert requires either --saved_model_dir or --keras_model_file to be defined. When using TF2.x, you should use --enable_v1_converter if you want to convert to quantized tflite from frozen graph.

EDIT:

What you are currently doing is called "dummy quantization", which can be used to test the inference timings of the quantized network. To properly quantize the network, min/max information of layers should be injected into it with fake quant nodes.

Please see this gist for example codes on how to do it. This blog post also has some information on quantization aware training.

sakumoil
  • 602
  • 4
  • 11
  • Thanks for the comment. i am using TF1.12, and use --graph_def_file tf-lite/MobileFacenet.pb file to convert. I thought pb file and saved model should both suppported by TF? – S.H Jul 13 '20 at 01:01
  • They should be. What is the exact error message you get when trying to convert? – sakumoil Jul 13 '20 at 05:04
  • I did not get error message, but the result is incorrect. I simply compare two face images, get the encoding of MobileFacenet.pb, and converted *.tflite. and calculate eu distance to verify the output. For faces of the same person, the distance should be smaller than faces of different person. The output of *.pb or using --post_training_quantize 1 to convert to *.tflite is ok. But when convert to uint8 with the above command, the result is incorrect. – S.H Jul 13 '20 at 05:21
  • 1
    What you are currently doing is called "dummy quantization", which can be used to test the inference timings of the quantized network. To properly quantize the network, min/max information of layers should be injected into it with fake quant nodes. Please see [this gist](https://gist.github.com/crypt3lx2k/cec6ad66b948fe0e77a7b1e6d2205bf4) for example codes on how to do it. [This blog post](https://medium.com/analytics-vidhya/mobile-inference-b943dc99e29b) also has some information on quantization aware training. – sakumoil Jul 13 '20 at 05:35
  • Thank you very much! Will go with Quantization Aware Training. – S.H Jul 13 '20 at 06:24
  • 1
    You're welcome. I added my previous comment also to the answer. – sakumoil Jul 13 '20 at 07:24
  • Hi @sakumoli, I followed the suggest to add fake quant but failed. Could you help check this post as well? Thanks! – S.H Jul 27 '20 at 04:50
  • https://stackoverflow.com/questions/63108767/create-training-graph-failed-when-converted-mobilefacenet-to-quantize-aware-mo – S.H Jul 27 '20 at 04:51