I have a wav2letter model (speech recognition model) where I am trying to properly introduce the fakeQuant operations manually. I have managed to introduce them in the proper place (so that the tflite converter manages to generate the u8_tflite_model) but my problem is that the min/max ranges are not getting updated on these operations while training. That means that they stay the same (or nearly the same) from the begging and the gradients seems not flow there.
I tried define the fakequant in many ways (e.g.) :
min_w = tf.get_variable("min_quant_weights", shape=[], initializer=tf.constant_initializer(0), trainable=True)
max_w = tf.get_variable("max_quant_weights", shape=[], initializer=tf.constant_initializer(1), trainable=True)
filters = tf.fake_quant_with_min_max_vars(filters, min=tf.reduce_min(min_w), max=tf.reduce_max(max_w), num_bits=8)
or
min_w = tf.Variable(0.0, name="min_quant_weights")
max_w = tf.Variable(1.0, name="max_quant_weights")
filters = tf.fake_quant_with_min_max_vars(filters, min=min_w, max=max_w)
or
min_w = 0
max_w = 1
filters = tf.quantization.fake_quant_with_min_max_args(bias, min=min_w, max=max_w, num_bits=8, narrow_range=False, name=None)
but whatever I try, the min / max values stay the same. Because of that the model (fine-tuned on only one sentence for now) manages to adapt the weights to these ranges and not change the activation ranges at all. I believe that both the ranges and the weights should be updated during training with fakeQuants (am I correct?) .
Also, the tf.contrib.quantize.create_training_graph
tool as described here : https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize does not work in my case because the models is defined is such a way that the tool cannot figure out where these fakeQuant operatios should be introduced, that's why I am trying a more "hacky" way to inject them in the graph source code.
Has anyone managed to do this properly, or I have totally misunderstood how the fakequant operators work?
Thanks!