To the best of my knowledge, the existing quantization method is operating on 32-bit. In order to quantize weight of CNN as well as reduce memory footprint and then port the quantized model into the mobile device, how to convert a 32-bit operation to a 4-bit or 8-bit operation on cpu?
Asked
Active
Viewed 300 times
0
-
It isn't quite clear what you mean by that. If your data is 32-bit, then doing 8-bit operations on it won't do you any good. If your data is quantised to a 8 bit data type, you just use 8 bit operations on your CPU. Note that not all CPUs have 8-bit operations, or have 8-bit operations that are faster than 32-bit operations. If yours does have such operations, you will probably have to write custom kernels that utilise them. You don't convert 32-bit operations to anything, you write kernels from scratch. – n. m. could be an AI Jul 03 '20 at 08:33
-
@n. 'pronouns' m. Thank you for your reply. I applied the quantization technique using pytorch, and the weight is quantified, but the computation on the CPU is not. In other words, if i quantify the weight of the deep learning model in my CPU, 32bit CPU : 0000 0000 ... 0010 but I want to make this work on a 4-bit only CPU as follows: 4bit CPU : 0010 Through this, we want to reduce the amount of computation. – user13851207 Jul 03 '20 at 13:19
-
You need a 4 bit CPU, which is not exactly a common thing. – n. m. could be an AI Jul 03 '20 at 14:07