Tensorflow: Fixed point quntization for a embedded DSP

Question

Apologies for my newbie question.

From documents like these I understand the advantages of using 8bit numbers to save on memory and increase the performance for very small impact on accuracy:

https://www.tensorflow.org/performance/quantization

Other blogs mention these quantized models can be offloaded to DSPs, I have low-cost DSP which can do 168 multiplications with an addition with 9bit inputs in a single clock at very low power consumption and I would like to use it to do inference on some models I trained. I don't want to use any present frameworks as they will not fit/work on the target anyway. I would like to just train the model, save it and then read the weights myself while I will hardcode the flow/graph of the network by hand just as a proof of the concept.

When I look at it as a compression-only it makes sense to use min/max for layer + 8bit weights, which can cause the non-symmetrical ranges and the 0 is not in the middle of the 8bit range. Decompressing it into a 32bit real value to make the calculation can be done easily.

But still it's mentioned in multiple blogs that this approach can be used directly on DSPs and do calculations with 8bit numbers. I still can't grasp how this could be implemented. In past I used to have fixed point math where I pretended somewhere is a decimal point and then shifted the result after multiplication. But I think this can't be done when the min/max/non-symmetrical approach is used to train/store the model. Am I missing something because I can't understand how this could be implemented in low-level with simple integer multipliers in DSPs?

Tensorflow: Fixed point quntization for a embedded DSP

0 Answers0