How to properly Inject fake_quant operations in a graph?

Question

I have a wav2letter model (speech recognition model) where I am trying to properly introduce the fakeQuant operations manually. I have managed to introduce them in the proper place (so that the tflite converter manages to generate the u8_tflite_model) but my problem is that the min/max ranges are not getting updated on these operations while training. That means that they stay the same (or nearly the same) from the begging and the gradients seems not flow there.

I tried define the fakequant in many ways (e.g.) :

min_w = tf.get_variable("min_quant_weights", shape=[], initializer=tf.constant_initializer(0), trainable=True)
max_w = tf.get_variable("max_quant_weights", shape=[], initializer=tf.constant_initializer(1), trainable=True)
filters = tf.fake_quant_with_min_max_vars(filters, min=tf.reduce_min(min_w), max=tf.reduce_max(max_w), num_bits=8)

or

min_w = tf.Variable(0.0, name="min_quant_weights")
max_w = tf.Variable(1.0, name="max_quant_weights")
filters = tf.fake_quant_with_min_max_vars(filters, min=min_w, max=max_w)

or

min_w = 0
max_w = 1
filters = tf.quantization.fake_quant_with_min_max_args(bias, min=min_w, max=max_w, num_bits=8, narrow_range=False, name=None)

but whatever I try, the min / max values stay the same. Because of that the model (fine-tuned on only one sentence for now) manages to adapt the weights to these ranges and not change the activation ranges at all. I believe that both the ranges and the weights should be updated during training with fakeQuants (am I correct?) .

Also, the tf.contrib.quantize.create_training_graph tool as described here : https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize does not work in my case because the models is defined is such a way that the tool cannot figure out where these fakeQuant operatios should be introduced, that's why I am trying a more "hacky" way to inject them in the graph source code.

Has anyone managed to do this properly, or I have totally misunderstood how the fakequant operators work?

Thanks!

I suppose I will somehow have to introduce tf.fake_quant_with_min_max_vars_gradient in the proper places ... — Konstantinos Monachopoulos, Jul 02 '19 at 16:28

score 0 · Answer 1 · answered Jul 08 '19 at 02:17

0

It is really challenging to do fake quant properly because one need to manipulate the training graph and insert fake quant in the right place.

For your use case, can you please try post-training quantization (https://medium.com/tensorflow/tensorflow-model-optimization-toolkit-post-training-integer-quantization-b4964a1ea9ba)? With post-training, you don't need to manipulate the training graph and the tool captures min/max range with calibration data.

answered Jul 08 '19 at 02:17

J.L.

134
2

I have managed to inject the fake quants in the correct place but the problem is that I probably have to define the gradients manually, for the ranges / weights getting adapted. I have followed the post-training quantisation approach to capture the ranges and later define them as fixed values in the fake-quants to fine-tune the weights but this is a two stage approach (in addition the extracted graph from post-training does not giving me the weight ranges, only the activations). I want to do all this using only quantisation-aware-training ... So, how should I properly define the gradient Ops ? – Konstantinos Monachopoulos Jul 08 '19 at 10:54

How to properly Inject fake_quant operations in a graph?

1 Answers1