How can I matrix-multiply two PyTorch quantized Tensors?

Question

I am new to tensor quantization, and tried doing something as simple as

import torch
x = torch.rand(10, 3)
y = torch.rand(10, 3)

x@y.T

with PyTorch quantized tensors running on CPU. I thus tried

scale, zero_point = 1e-4, 2
dtype = torch.qint32
qx = torch.quantize_per_tensor(x, scale, zero_point, dtype)
qy = torch.quantize_per_tensor(y, scale, zero_point, dtype)

qx@qy.T # I tried...

..and got as error

RuntimeError: Could not run 'aten::mm' with arguments from the 'QuantizedCPUTensorId' backend. 'aten::mm' is only available for these backends: [CUDATensorId, SparseCPUTensorId, VariableTensorId, CPUTensorId, SparseCUDATensorId].

Is matrix multiplication just not supported, or am I doing something wrong?

score 5 · Answer 1 · answered Feb 24 '20 at 06:01

It is not straight forward to implement matrix multiplication for quantized matrices. Therefore, the "conventional" matrix multiplication (@) does not support it (as your error message suggests).

You should look at quantized operations, e.g., torch.nn.quantized.functional.linear:

torch.nn.quantized.functional.linear(qx[None,...], qy.T)

How can I matrix-multiply two PyTorch quantized Tensors?

1 Answers1