Perform "fake" quantization on torch tensor

Question

I'm working on something called Federated Learning. In this project, resource constraints are simulated, and the local model updates thus have to be compressed. This does not have to be efficient in any way, we just want to see the effects that this form of compression would have on convergence and convergence time.

I have tried different approaches, similar too

def randomQuantization(w, w_global, ratio, defualt_bit_depth=33):
w_avg = copy.deepcopy(w[0])
for k in w_avg.keys():
    maxElement = torch.max(w_global[k]).item()
    minElement = torch.min(w_global[k]).item()
    for i in range(1, len(w)):
        bit_depth = round(defualt_bit_depth * ratio[i])
        scale = (2**bit_depth - 1) / max(abs(maxElement), abs(minElement))
        w_temp = torch.quantization.FakeQuantize(w[:][k].item(), quant_min=minElement, quant_max=maxElement, scale=scale, fake_quant_enable=True)
        w_avg[k] += w_temp[i]
    w_avg[k] = torch.div(w_avg[k], len(w))
return w_avg

It seems like torch.quantization.FakeQuantize does not work for the given situation. And quantization where we choose dtype, like dynamic_quantization, is a non-starter since it has so few float types to choose from. Also many "normal" ways of doing this do not seem compatible with the tensor datatype, at least the ones I've tried.

Edit: After searching google scholar and github for days I found this solution

def stochasticQuantization(w, ratio, defualt_bit_depth=32):
# https://github.com/scottjiao/Gradient-Compression-Methods/blob/master/utils.py
w_avg = copy.deepcopy(w[0])
for k in w_avg.keys():
    for i in range(1, len(w)):
        x=w[i][k].float()
        levels = round(defualt_bit_depth * (1 - ratio[i]))
        x_norm = torch.norm(x, p=float('inf'))
        sgn_x=((x>0).float()-0.5)*2
        p=torch.div(torch.abs(x),x_norm)
        renormalize_p=torch.mul(p,levels)
        floor_p=torch.floor(renormalize_p)
        compare=torch.rand_like(floor_p)
        final_p=renormalize_p-floor_p
        margin=(compare < final_p).float()
        xi=(floor_p+margin)/levels
        w_avg[k] += x_norm*sgn_x*xi
    w_avg[k] = torch.div(w_avg[k], len(w))
return w_avg

This is based on the research paper https://arxiv.org/pdf/1610.02132.pdf if I understand it correctly.

What do you mean by "does not work?" Could you please expand your question to include a more detailed description of the expected behaviour? — Bedir Yilmaz, Oct 04 '21 at 14:17
torch.quantization.FakeQuantize, can only use integers with a maximum of 16 bits. I want to quantize a torch.float32, so that the information, in theory, would fit into fewer bits than what torch.float32 requires. Regarding "fake" quantize. What I mean here is that we still want the output to be torch.float32, but we want to manipulate the values to approximately how they would be represented if we only had, for example, 12 bits to use. — JarrList, Oct 05 '21 at 16:20
I see, I recommend expanding your question with the facts you have just provided. — Bedir Yilmaz, Oct 06 '21 at 06:24

Perform "fake" quantization on torch tensor

0 Answers0