How to perform fixed-point quantization in Python

Question

I wish to quantize the weights and biases of an existing Neural Network model. As per my understanding, the fixed-point representation ensures a fixed bit-width of the weights, biases and activations, with pre-determined fixed number of integer and fraction bits.

Essentially I want to perform Post Training Quantization. I checked out this article https://www.tensorflow.org/model_optimization/guide/quantization/post_training .

However I couldn't find any support for what I want to do i.e. be able to specify the number of the integer and fraction bits within the fixed-point representation scheme for the weights, biases and activations.

I did find the QKeras library which seemed to support this functionality. However, it does not seem to have an built-in quantized sigmoid layer.

Any pointers or library/article recommendations that could aid me in doing what I want to do, would be immensely helpful and greatly appreciated.

score 0 · Answer 1 · answered Feb 18 '21 at 11:57

A way to quantize weights and bias using an arbitrary fixed-point fractional (base 2) type is using fxpmath python package.

A simple example of what you're trying to do is this jupyter notebook: Fixed-Point Precision Neural Network for MNIST dataset. Take into account that it only shows conversion of weights and bias to evaluate inference performance, but not of activations. This can also implemented manually using same package (it supports Numpy).

How to perform fixed-point quantization in Python

1 Answers1