1

I have a quantized model in pytorch and now I want to extract the parameter of the quantized linear layer and implement the forward manually. I search the source code but only find this function.

def forward(self, x: torch.Tensor) -> torch.Tensor:
    return torch.ops.quantized.linear(
    x, self._packed_params._packed_params, self.scale, self.zero_point)

But no where I can find how torch.ops.quantized.linear is defined.

Can someone give me a hind how the forward of quantized linear are defined?

stephen
  • 11
  • 1

1 Answers1

0

In answer to the question of where torch.ops.quantized.linear is, I was looking for the same thing but was never able to find it. I believe it's probably somewhere in the aten (C++ namespace). I did, however, find some useful PyTorch-based implementations in the NVIDIA TensorRT repo below. It's quite possible these are the ones actually called by PyTorch via some DLLs. If you're trying to add quantization to a custom layer, these implementations walk you through it.

You can find the docs here and the GitHub page here.

For the linear layer specifically, see the QuantLinear layer here

Under the hood, this calls TensorQuantFunction.apply() for post-training quantization or FakeTensorQuantFunction.apply() for quantization-aware training.