Exclude Rescaling layer from TensorFlow quantization while preserving sparsity and clustering

Question

I'm following this guide for performing quantization on my model stripped_clustered_model. Unfortunately, my model contains a layer, which can not be quantized (Rescaling layer). In order to account for that, I'm using quantize_annotate_layer to only mark the other layers for quantization. I'm doing that by calling this code:

def apply_quantization_to_non_rescaling(layer):
    if not isinstance(layer, tf.keras.layers.Rescaling):
        print('=> NOT Rescaling')
        return tfmot.quantization.keras.quantize_annotate_layer(layer, quantize_config=None)
    print('=> Rescaling')
    return layer

quant_aware_annotate_model = tf.keras.models.clone_model(
    stripped_clustered_model,
    clone_function=apply_quantization_to_non_rescaling,
)

pcqat_model = tfmot.quantization.keras.quantize_apply(
              quant_aware_annotate_model,
              tfmot.experimental.combine.Default8BitClusterPreserveQuantizeScheme(preserve_sparsity=True)
)

For my understanding, I mark all layers, which I want to quantize, with quantize_annotate_layer. Later, I call quantize_apply to actually perform this quantization. However, running this code leads to following error:

=> Rescaling
=> NOT Rescaling
=> NOT Rescaling
=> NOT Rescaling
=> NOT Rescaling
=> NOT Rescaling
=> NOT Rescaling
=> NOT Rescaling
=> NOT Rescaling
=> NOT Rescaling
Traceback (most recent call last):
  File "model_2.py", line 332, in <module>
    main()
  File "model_2.py", line 304, in main
    pcqat_model = tfmot.quantization.keras.quantize_apply(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_model_optimization/python/core/keras/metrics.py", line 74, in inner
    raise error
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_model_optimization/python/core/keras/metrics.py", line 69, in inner
    results = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_model_optimization/python/core/quantization/keras/quantize.py", line 474, in quantize_apply
    return keras.models.clone_model(
  File "/home/user/.local/lib/python3.8/site-packages/keras/models.py", line 453, in clone_model
    return _clone_sequential_model(
  File "/home/user/.local/lib/python3.8/site-packages/keras/models.py", line 330, in _clone_sequential_model
    if isinstance(layer, InputLayer) else layer_fn(layer))
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_model_optimization/python/core/quantization/keras/quantize.py", line 408, in _quantize
    full_quantize_config = quantize_registry.get_quantize_config(layer)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_model_optimization/python/core/quantization/keras/collaborative_optimizations/cluster_preserve/cluster_preserve_quantize_registry.py", line 293, in get_quantize_config
    quantize_config = (default_8bit_quantize_registry.
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_model_optimization/python/core/quantization/keras/default_8bit/default_8bit_quantize_registry.py", line 272, in get_quantize_config
    raise ValueError(
ValueError: `get_quantize_config()` called on an unsupported layer <class 'keras.layers.preprocessing.image_preprocessing.Rescaling'>. Check if layer is supported by calling `supports()`. Alternatively, you can use `QuantizeConfig` to specify a behavior for your layer.

The output shows me, that all, but the first layer (that is the rescaling layer) are marked for quantization. However, the following error tells me, that also the rescaling layer is used for quantization.

How can I exclude the rescaling layer from quantization?

Update 22.4.2022: It is no option to fall back to the default quantization strategy by using

pcqat_model = tfmot.quantization.keras.quantize_apply(
    quant_aware_annotate_model
)

instead of


pcqat_model = tfmot.quantization.keras.quantize_apply(
    quant_aware_annotate_model,
    tfmot.experimental.combine.Default8BitClusterPreserveQuantizeScheme(preserve_sparsity=True)
)

since this would not preserve sparsity and clustering.

score 1 · Answer 1 · answered Apr 20 '22 at 07:20

1

So there somehow is an error when passing another quantization strategy than the default one. If you just use pcqat_model = tfmot.quantization.keras.quantize_apply(quant_aware_annotate_model) it will work. I tried to use the other non-experimental quantization strategies, but they will also throw some errors. So if you are absolutely set on another strategy than the default one, this won't help you, but if you just want to use quantization use the default one.

answered Apr 20 '22 at 07:20

Noltibus

1,300
3
12
34

Hi Noltibus, thank you for your answer. It does work when not using the default quantization strategy. Unfortunately, this means that sparsity and clustering are not preserved. But this is kind of the reason for applying this strategy in my case. – mat.tho Apr 22 '22 at 08:21

Exclude Rescaling layer from TensorFlow quantization while preserving sparsity and clustering

1 Answers1