I'm perplexed by the Tensorflow post-training quantization process. The official site refers to Tensorflow Lite Quantization. Unfortunately, this doesn't work in my case, that is, TFLiteConverter
returns errors for my Mask RCNN model:
Some of the operators in the model are not supported by the standard TensorFlow Lite runtime and are not recognized by TensorFlow. If you have a custom implementation for them you can disable this error with --allow_custom_ops, or by setting allow_custom_ops=True when calling tf.lite.TFLiteConverter(). Here is a list of builtin operators you are using: <...>. Here is a list of operators for which you will need custom implementations: DecodeJpeg, StatelessWhile.
Basically, I've tried all available options offered by TFLiteConverter
including experimental ones. I'm not quite surprised with those errors as it might make sense not to support decodejpeg for the mobile, however, I want my model to be served by Tensorflow Serving, thus I don't know why Tensorflow Lite is the official choice to go for.
I've also tried Graph Transform Tool, which seems to be deprecated, and discovered 2 issues. Firstly it's impossible to quantize with bfloat16 or float16, only int8. Secondly, the quantized model breaks with the error:
Broadcast between [1,20,1,20,1,256] and [1,1,2,1,2,1] is not supported yet
what isn't an issue in the regular model.
Furthermore, it's worth to mention my model was originally built with Tensorflow 1.x, and then ported to Tensorflow 2.1 via tensorflow.compat.v1
.
This issue stole a significant amount of my time. I'd be grateful for any cue.