Questions tagged [quantization]

Use this tag for questions related to quantization of any kind, such as vector quantization.

Quantization, in mathematics and digital signal processing, is the process of mapping a large set of input values to a (countable) smaller set.

For more, please read the Wikipedia article for more.

444 questions
1
vote
0 answers

Error when running a huggingface model in 4bit mode in Streamlit using bitsnbytes. Quant state is being set to None unwillingly

I am loading a huggingface starchat beta model in streamlit and caching it thus : @st.cache_resource def load_model(): """Initialize the tokenizer and the AI model.""" tokenizer =…
1
vote
0 answers

How does TFLite calculate quantized dense layers

I am trying to figure out how exactly quantized dense layers are calculated. I found this blog post explaining the process and wanted to reimplement the code, but with the parameters extracted from a TFLite Model. However, I am not seeing the…
Necrotos
  • 61
  • 8
1
vote
1 answer

Llama QLora error: Target modules ['query_key_value', 'dense', 'dense_h_to_4h', 'dense_4h_to_h'] not found in the base model

EDIT: solved by removing target_modules I tried to load Llama-2-7b-hf LLM with QLora with the following code: model_id = "meta-llama/Llama-2-7b-hf" tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True) # I have permissions. model…
Ofir
  • 590
  • 9
  • 19
1
vote
1 answer

Load ONNX model with low-bit quantized weight

I have quantized my model to 2-bit and packed them into uint8 format (store 4x 2-bit weight in an uint8 variable) in pytorch. In this way, the model size has been reduced from 1545M to 150M, and the VRAM for loading the model is also greatly reduced…
Yefei He
  • 11
  • 1
1
vote
0 answers

8bit Quantization: Prediction outputs uncorrelated to underlying model

I quantized a basic TFLite regression model to int8 but the prediction output seems to be highly uncorrelated with the actual underlying model prior to quantizing it. All the code and steps taken to train and quantize the model are seen below to…
Bemz
  • 129
  • 1
  • 16
1
vote
0 answers

Is there a way to use f32 format for Math ops in TFLite while keeping other ops in quantized format during Keras model conversion?

Possible way to skip quantization for certain layers / Ops in tflite converter Hello Everyone, Is there a possible way to skip quantization for certain layers / Ops when converting a keras model to tflite model? specifically for Math.ops supported…
1
vote
0 answers

Dequant layer in tflite model

I am running the training example for Quantization aware training as is in Google Colab. You can find the tutorial here. My exact code can be found here. I expected that all the layers in the output frozen model (.tflite) would be quantized…
1
vote
0 answers

jpeg python 8x8 window DCT and quantisation process

I am trying to build a simple jpeg compression process in python using but DCT and quantisation but not the Huffman coding. This is what I have done so far (compress and uncompress the same image): import cv2 as cv from scipy.fftpack import dct,…
markman8
  • 31
  • 4
1
vote
0 answers

Pytorch: Quantize weights to unisgned 8 bit

I am trying to quantize the weights of the BERT model to unsigned 8 bits, and working with PyTorch. It works well with int8 datatype, but throws an error for unsigned int8 bits. Am using the 'dynamic_quantize' function for the same. quantized_model…
1
vote
0 answers

Custom 8bit quantized inference in Tensorflow

I have simple fully connected model with following architecture: Input(10) → Dense(10) → Dense(10) → Dense(10) → Dense(10) → Dense(1) I performed integer 8 QAT on it, and tflite inference. I want to perform QAT with int8 but with some changes to…
Stsh4lson
  • 31
  • 4
1
vote
0 answers

Does Pytorch PTQ (Post Training Quantization) actually use quantized integer weights for inference?

After Pytorch Post training quantization, I find that the forward propagation of the quantized model still seems to use dequantized float32 weights, rather than using quantized int8. Below I attached the PTQ example given on the Pytorch quantization…
Asher Mai
  • 11
  • 2
1
vote
0 answers

SnapML model issue on Lens Studio

I built a custom model for classifying images of cars using Tensorflow and Keras, to use it for building a Snap lens powered by machine learning. Lens Studio only accepts quantized models; the model had to go through the quantization process using…
1
vote
1 answer

Grid+ing and labeling the 3d Points

I have matrix of 3D points. I want to quantize them into spatial grid and late represent that matrix by their grid number. Let say the matrix is D D = np.array([[-45.08341177, 34.40457052, 7.63253164], [-46.81391587, 34.35034554, …
1
vote
0 answers

Is there a problem with DCT or quantization when encoding JPEG files?

When I make my own BMP to JPEG encoder, I found that the resulting JPEG image was very different from the original. After comparing RGB images to YUV images, I found that the two images were exactly the same. Instead of optimizing DCT…
TIger_zh
  • 11
  • 2
1
vote
1 answer

How pytorch implement forward for a quantized linear layer?

I have a quantized model in pytorch and now I want to extract the parameter of the quantized linear layer and implement the forward manually. I search the source code but only find this function. def forward(self, x: torch.Tensor) -> torch.Tensor: …
stephen
  • 11
  • 1