Is it possible to quantize individual layers weights/activations post training?

Question

Quantization-aware training in Tensorflow allows me to quantize individual levels with different quantization configurations using tensorflow_model_optimization.quantization.keras.quantize_annotate_layer. I want to have a similar effect on an already-trained model.

In the post-training quantization documentation of Tensorflow, the following is an example of quantizing a model to float16.

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_quant_model = converter.convert()

However, I believe this quantizes all model layers' activations and weights. Is there a way to only select certain tensorflow.keras.Layer instances in the model after training and saving the model file?

score 0 · Answer 1 · answered Oct 26 '22 at 07:23

Well, you can go to this page of Tensorflow where they have to tell the exact procedure for Quantizing specific Layers...

https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide.md

But here is an example...

# Create a base model
base_model = setup_model()
base_model.load_weights(pretrained_weights) # optional but recommended for model accuracy

# Helper function uses `quantize_annotate_layer` to annotate that only the 
# Dense layers should be quantized.
def apply_quantization_to_dense(layer):
  if isinstance(layer, tf.keras.layers.Dense):
    return tfmot.quantization.keras.quantize_annotate_layer(layer)
  return layer

# Use `tf.keras.models.clone_model` to apply `apply_quantization_to_dense` 
# to the layers of the model.
annotated_model = tf.keras.models.clone_model(
    base_model,
    clone_function=apply_quantization_to_dense,
)

# Now that the Dense layers are annotated,
# `quantize_apply` actually makes the model quantization aware.
quant_aware_model = tfmot.quantization.keras.quantize_apply(annotated_model)
quant_aware_model.summary()

A functional Example..

# Use `quantize_annotate_layer` to annotate that the `Dense` layer
# should be quantized.
i = tf.keras.Input(shape=(20,))
x = tfmot.quantization.keras.quantize_annotate_layer(tf.keras.layers.Dense(10))(i)
o = tf.keras.layers.Flatten()(x)
annotated_model = tf.keras.Model(inputs=i, outputs=o)

# Use `quantize_apply` to actually make the model quantization aware.
quant_aware_model = tfmot.quantization.keras.quantize_apply(annotated_model)

# For deployment purposes, the tool adds `QuantizeLayer` after `InputLayer` so that the
# quantized model can take in float inputs instead of only uint8.
quant_aware_model.summary()

Is it possible to quantize individual layers weights/activations post training?

1 Answers1