6

I am new to tensor quantization, and tried doing something as simple as

import torch
x = torch.rand(10, 3)
y = torch.rand(10, 3)

x@y.T

with PyTorch quantized tensors running on CPU. I thus tried

scale, zero_point = 1e-4, 2
dtype = torch.qint32
qx = torch.quantize_per_tensor(x, scale, zero_point, dtype)
qy = torch.quantize_per_tensor(y, scale, zero_point, dtype)

qx@qy.T # I tried...

..and got as error

RuntimeError: Could not run 'aten::mm' with arguments from the 'QuantizedCPUTensorId' backend. 'aten::mm' is only available for these backends: [CUDATensorId, SparseCPUTensorId, VariableTensorId, CPUTensorId, SparseCUDATensorId].

Is matrix multiplication just not supported, or am I doing something wrong?

Davide Fiocco
  • 5,350
  • 5
  • 35
  • 72

1 Answers1

5

It is not straight forward to implement matrix multiplication for quantized matrices. Therefore, the "conventional" matrix multiplication (@) does not support it (as your error message suggests).

You should look at quantized operations, e.g., torch.nn.quantized.functional.linear:

torch.nn.quantized.functional.linear(qx[None,...], qy.T)
Shai
  • 111,146
  • 38
  • 238
  • 371