Questions tagged [quantization]

Use this tag for questions related to quantization of any kind, such as vector quantization.

Quantization, in mathematics and digital signal processing, is the process of mapping a large set of input values to a (countable) smaller set.

For more, please read the Wikipedia article for more.

444 questions
0
votes
0 answers

How to do explicit quantization with TensorRT by setting weights, biases and scales?

I have done the following steps as inputs to the problem: trained a MNIST model using Tensorflow 2.11 (see link below) made the model Quantization Aware (QA) using tfmot.quantization.keras.quantize_model trained the QA model a bit extra to adapt to…
0
votes
0 answers

Error while quantizing transformer in Pytorch

input = torch.randn(2, 6, 1024) t = torch.randn(2) print(model_static_quantized(input, t)) Gives this error NotImplementedError: Could not run 'quantized::linear' with arguments from the 'CPU' backend. This could be because the operator doesn't…
Timothy Oh
  • 147
  • 5
0
votes
0 answers

How does post-training quantization work?

I am currently interested in model quantization, especially that of post-training quantization of neural network models. I simply want to convert the existing model (i.e., TensorFlow model) using float32 weights into a quantized model with float16…
0
votes
0 answers

Can post-training-quantize a Tensorflow model using QKeras to an arbitrary bitwidth?

I want to train a model using full precision (float32) using Keras, and then quantize it to an arbitrary number of bits using QKeras. Is this possible? The common use case for QKeras is to predefine the model in QKeras APIs and use the quantized…
Tortellini Teusday
  • 1,335
  • 1
  • 12
  • 21
0
votes
0 answers

Why are the weights of my QAT tf_model are floats and not 8-bit Integers?

I performed a simple Quantization Aware Training with Tensorflow on MNIST as follows: import numpy as np import tensorflow as tf from tensorflow import keras from tensorflow.keras.datasets import mnist # Load MNIST dataset mnist =…
nechi
  • 53
  • 6
0
votes
0 answers

Quantization aware training (QAT) with convolution output 32b quantization (for sanity check) accuracy drops too much

I modified Distiller (https://github.com/IntelLabs/distiller) to emulate in-memory computing circuit, especially added a convolution layer quantization during QAT. However, accuracy drops over 60% (90% > 30%) even with 32b quantization for sanity…
Y.J.Jo
  • 1
  • 1
0
votes
0 answers

Quantization aware training in tensorflow makes my model predict allways the same class

I'm using tensorflow_model_optimization..quantization.keras.quantize_model on a VGG16 model pretrained on imagenet, to test transfer learning over cifar10. When trained normally for 5 epochs, accuracy reaches 60%, but when quantized it directly…
0
votes
0 answers

post quantization int8 and prune my model after i trained it using ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8

im trying to inference my model in Arduino 33BLE and to do so i trained my model using ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8 i got a model size with 6.5mb with 88%map@0.5IOU which is nice so i tried to quantize the model using int8 but the…
0
votes
0 answers

How to make quantized tflite support tf.pow function?

Because the original tf.pow function cannot be quatilized. So I want to adjust the implementation method of the pow function so that tflite can support the pow function. The original method of Pow function is Po = Pi ^ gamma changed to ln (Po) =…
0
votes
0 answers

Quantizing JPEG files using Pillow increases file size

I'm trying to use the Pillow library to quantize JPEGs to decrease file size but I find that quantizing and converting back to RGB results in a larger file size. I am saving the JPEG with the default quality of 75. I have also tried converting the…
0
votes
2 answers

Is there a way for a single GPU and model to run deep learning model prediction/inference in parallel

If there's a 8G-RAM GPU, and has loaded a model that takes all the 8G RAM, is it possible to run multiple model prediction/inference in parallel? or you can only run a prediction at a same time period
0
votes
1 answer

Conversion of Tensorflow-lite model to F16 and INT8

I need to evaluate performance of CNN (Convolutional Neural Network) on an edge device. I started with understanding what is quantization and how run it in colab using interpreter (emulator). Full code is here ->…
0
votes
0 answers

Pytorch training with brevitas gives Nan after first step

So, I'm trying to implement the VGG16 architecture with pytorch and brevitas for FPGA. While training the model, my loss comes out to be nan. My input images are correct (I debugged this with a couple of outputs), but there is no prediction…
0
votes
0 answers

TFLite quantization in multiple input

I have a model of saved_model format, and I need convert it to tflite with quantization. The problem is model have two input nodes, named "serving_default_input.1" and "serving_default_input.81", and I'm confuse about code of convertion. I write…
heiheihei
  • 659
  • 1
  • 6
  • 15
0
votes
1 answer

image quantization

In Efford's cd there is a code for grayscale image quantization: int n = 8 - numBits;//numBits will be taken as input float scale = 255.0f / (255 >> n); byte[] tableData = new byte[256]; for (int i = 0; i < 256; ++i) tableData[i] = (byte)…
ashish nirkhe
  • 689
  • 3
  • 10
  • 22