Quantization scheme for Convolutional Neural Network 8-bit quantization in tensorflow

Question

Tensorflow code for quantization From all the papars i have reffered for CNN quantization the quantization scheme is stated as

step size = range/255 for 8-bit here range = xmax-xmin but as shown in the image in the tensorflow implementation

range is given by range = std::max(std::abs(*min_value), std::abs(*max_value));

CAN ANY ONE TELL ME THE DIFFERENCE OR PURPOSE

score 0 · Answer 1 · answered Apr 01 '20 at 23:10

0

This is because the code you are pointing to is for symmetric quantization where the range needs to be the same on both sides of 0. So the "range" variable in that code really refers to half of the entire floating point range.

for instance, min_value = -1 max_value = 2

range = std::max(abs(-1), abs(2)) = 2

So the entire range in that code will be -2 to 2.

Hope that makes sense!

answered Apr 01 '20 at 23:10

suharshs

1,088
8
10

Tq and then is that so while we calculate stepsize we consider 1sided range bcz in the same code by considering ur example scaling factor= 2/127 – Akash Bhogar Apr 02 '20 at 03:51

Quantization scheme for Convolutional Neural Network 8-bit quantization in tensorflow

1 Answers1