Questions tagged [quantization]

Use this tag for questions related to quantization of any kind, such as vector quantization.

Quantization, in mathematics and digital signal processing, is the process of mapping a large set of input values to a (countable) smaller set.

For more, please read the Wikipedia article for more.

444 questions
0
votes
2 answers

How to find the size of a deep learning model?

I am working with different quantized implementations of the same model, the main difference being the precision of the weights, biases, and activations. So I'd like to know how I can find the difference between the size of a model in MBs that's in…
0
votes
1 answer

Algorithm for detecting Voltage levels in a voltage vs time data/waveform

I am analyzing a voltage output that I get from spice simulator and I want to quantize the time sampled voltage data so that I can convert the given (trapezoidal wave) to a square wave. I have tried differentiation as a method to understand when the…
0
votes
1 answer

Why did I get 'AssertionError: did not find fuser method for:' error while doing static quantization for a PyTorch model

I am getting the following error while I am trying to apply static quantization on a model. The error is in the fuse part of the code: torch.quantization.fuse_modules(model, modules_to_fuse): model = torch.quantization.fuse_modules(model,…
Celik
  • 2,311
  • 2
  • 32
  • 54
0
votes
1 answer

XGBoost model quantization - Sklearn model quantization

I am looking for solutions to quantize sklearn models. I am specifically looking for XGBoost models. I did find solutions to quantize pytorch and tensorflow models but nothing on sklearn. Solutions tried: Converted sklearn model to ONNX and then…
pratsbhatt
  • 1,498
  • 10
  • 20
0
votes
1 answer

onnx.load() | ALBert throws DecodeError: Error parsing message

Goal: re-develop this BERT Notebook to use textattack/albert-base-v2-MRPC. Kernel: conda_pytorch_p36. PyTorch 1.8.1+cpu. I convert a PyTorch / HuggingFace Transformers model to ONNX and store it. DecodeError occurs on onnx.load(). Are my ONNX files…
DanielBell99
  • 896
  • 5
  • 25
  • 57
0
votes
1 answer

Converting PyTorch to ONNX model increases file size for ALBert

Goal: Use this Notebook to perform quantisation on albert-base-v2 model. Kernel: conda_pytorch_p36. Outputs in Sections 1.2 & 2.2 show that: converting vanilla BERT from PyTorch to ONNX stays the same size, 417.6 MB. Quantization models are…
DanielBell99
  • 896
  • 5
  • 25
  • 57
0
votes
1 answer

HuggingFace - 'optimum' ModuleNotFoundError

I want to run the 3 code snippets from this webpage. I've made all 3 one post, as I am assuming it all stems from the same problem of optimum not having been imported correctly? Kernel: conda_pytorch_p36 Installations: pip install optimum OR ! pip…
0
votes
1 answer

Quantizing the neural network weights and biases to int16 format

I am trying to quantize the weights and biases of my neural network to a 16 bit integer format. The reason for this is to use these arrays in CCS to program the network on a MCU. While I followed the process for post-training quantization using…
0
votes
1 answer

LeNet5 inference based on quantize TFLite model. How to downscale int32 to int8 with the M parameter?

I trained a LeNet5 CNN with Keras/TensorFlow. I used TensorFlow Lite to quantized FP32 weights and activations to INT8. I extracted and visualized weights, biases, scales en zero-points thanks to Netron. I needed to design LeNet5 CNN in C langage.…
0
votes
1 answer

How to visualize feature maps of a TensorFlow Lite model?

I used Keract to visualize the feature maps of a TensorFlow/Keras model. I have applied quantization with TensorFlow Lite. I would like to visualize the feature maps generated by the TensorFlow Lite model during an inference. Do you know a way to do…
0
votes
0 answers

tensorflow.python.framework.errors_impl.InvalidArgumentError: Length for attr 'output_shapes' of 0 must be at least minimum 1

I am working on an image segmentation use case. So I created a keras model with ".h5" extension and then I am now trying to convert this keras model to a tensorflow lite model as this tensorflow lite model is required to run in the android…
0
votes
1 answer

Python sounddevice.rec(), dtype ='int8' quantizes to zero problem

I am trying to plot my voice signal, with different dtypes (the bits/sample obviously). So i tried to capture my voice with dtype = 'int16' and the plot made sense. But i tried to speak in the same sound level with dtype = 'int8' and my plot is a…
0
votes
1 answer

Quantization aware training worse than post-quantization

I want to use quantization aware training to quantize my model to int8. Unfortunately, I cant simply quantize the entire model, since my first layer is a batch normalization (after InputLayer), so I need to use a custom quantizationConfig for that…
horsti
  • 113
  • 1
  • 7
0
votes
1 answer

INT8 quantization for matmul

Being inspired by "Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model", I decided to follow through with the caveat of the paper. However, I get confused about setting offset variables during quantization. INPUT :…
0
votes
1 answer

INT8 quantization for FP32 matrix multiplication

I tried to apply INT8bit quantization before FloatingPoint32bit Matrix Multiplication, then requantize accumulated INT32bit output to INT8bit. After all, I guess there's a couple of mix-ups somewhere in the process. I feel stuck in spotting those…