I am new to tensor quantization, and tried doing something as simple as
import torch
x = torch.rand(10, 3)
y = torch.rand(10, 3)
x@y.T
with PyTorch quantized tensors running on CPU. I thus tried
scale, zero_point = 1e-4, 2
dtype = torch.qint32
qx = torch.quantize_per_tensor(x, scale, zero_point, dtype)
qy = torch.quantize_per_tensor(y, scale, zero_point, dtype)
qx@qy.T # I tried...
..and got as error
RuntimeError: Could not run 'aten::mm' with arguments from the 'QuantizedCPUTensorId' backend. 'aten::mm' is only available for these backends: [CUDATensorId, SparseCPUTensorId, VariableTensorId, CPUTensorId, SparseCUDATensorId].
Is matrix multiplication just not supported, or am I doing something wrong?