Could anyone suggest a way of defining the computational complexity of a neural network after quantization?
I understand computation complexity as the amount of arithmetic “work” needed to calculate the entire network or a single layer. Nevertheless, when a neural network has been quantized, numbers are not represented anymore by the same format (the new format will depend on the quantization method used as described here, e.g.
We proceed from multiplying two real numbers in all operations to multiplying the pairs of int8, float 16, etc. The latter operations are evidently “simpler” than the multiplication of two reals.
Therefore, this has an effect on the time and memory it takes to carry out computations and as a consequence the traditional metrics, as for example "BigO" notation, do not make sense.