The paper "Natural Language Processing with Small Feed-Forward Networks" https://arxiv.org/pdf/1708.00214.pdf states:
I've implemented quantization as per the above equations in python:
b = 128
embedding_matrix = [[20000,3000,1000],[1999999,20000,1999999], [20000,3000,1000]]
scaled = [ abs(round( (1 / (b - 1) * max(e)) , 3)) for e in embedding_matrix]
print(scaled)
i = 0
quantized = []
for e in embedding_matrix :
for v in e :
quantized.append((v , math.floor(.5 + ( (v / scaled[i]) + b) )))
i = i + 1
quantized
Running this code quantized
is set to :
[(20000, 255),
(3000, 147),
(1000, 134),
(1999999, 255),
(20000, 129),
(1999999, 255),
(20000, 255),
(3000, 147),
(1000, 134)]
How to de-quantize back to the original values prior to quantization ?
Reading https://www.tensorflow.org/api_docs/python/tf/quantization/dequantize describes :
tf.quantization.dequantize(
input, min_range, max_range, mode='MIN_COMBINED', name=None, axis=None,
narrow_range=False, dtype=tf.dtypes.float32
)
[min_range, max_range] are scalar floats that specify the range for the output. The 'mode' attribute controls exactly which calculations are used to convert the float values to their quantized equivalents.
and the PyTorch docs: https://pytorch.org/docs/stable/quantization.html
Seems to implement quantize differently to above implementation ?