Tensorflow model quantization best strategy

Question

I'm perplexed by the Tensorflow post-training quantization process. The official site refers to Tensorflow Lite Quantization. Unfortunately, this doesn't work in my case, that is, TFLiteConverter returns errors for my Mask RCNN model:

Some of the operators in the model are not supported by the standard TensorFlow Lite runtime and are not recognized by TensorFlow. If you have a custom implementation for them you can disable this error with --allow_custom_ops, or by setting allow_custom_ops=True when calling tf.lite.TFLiteConverter(). Here is a list of builtin operators you are using: <...>. Here is a list of operators for which you will need custom implementations: DecodeJpeg, StatelessWhile.

Basically, I've tried all available options offered by TFLiteConverter including experimental ones. I'm not quite surprised with those errors as it might make sense not to support decodejpeg for the mobile, however, I want my model to be served by Tensorflow Serving, thus I don't know why Tensorflow Lite is the official choice to go for. I've also tried Graph Transform Tool, which seems to be deprecated, and discovered 2 issues. Firstly it's impossible to quantize with bfloat16 or float16, only int8. Secondly, the quantized model breaks with the error:

Broadcast between [1,20,1,20,1,256] and [1,1,2,1,2,1] is not supported yet

what isn't an issue in the regular model.

Furthermore, it's worth to mention my model was originally built with Tensorflow 1.x, and then ported to Tensorflow 2.1 via tensorflow.compat.v1.

This issue stole a significant amount of my time. I'd be grateful for any cue.

Same problem here, it is hard to believe there is no choice for quantizing models just by the sake of performance and optimization for run on the Cloud, there is mobile everywhere only. — ElPapi42, Apr 24 '20 at 15:01

score 0 · Answer 1 · answered Jul 22 '20 at 03:07

0

You can convert the model to Tensorflow Lite and use unsupported ops (Like DecodeJpeg) from TF, this is called SELECT TF OPS, see the guide here on how to enable it during conversion.

answered Jul 22 '20 at 03:07

Karim Nosseir

540
2
8

Tensorflow model quantization best strategy

1 Answers1