Highest Voted 'quantization' Questions

1

vote

0 answers

Error when running a huggingface model in 4bit mode in Streamlit using bitsnbytes. Quant state is being set to None unwillingly

I am loading a huggingface starchat beta model in streamlit and caching it thus : @st.cache_resource def load_model(): """Initialize the tokenizer and the AI model.""" tokenizer =…

asked Aug 30 '23 at 13:17

Abhilash Pal

11
2

1

vote

0 answers

How does TFLite calculate quantized dense layers

I am trying to figure out how exactly quantized dense layers are calculated. I found this blog post explaining the process and wanted to reimplement the code, but with the parameters extracted from a TFLite Model. However, I am not seeing the…

python numpy tensorflow deep-learning quantization

asked Aug 04 '23 at 12:11

Necrotos

61
8

1

vote

1 answer

Llama QLora error: Target modules ['query_key_value', 'dense', 'dense_h_to_4h', 'dense_4h_to_h'] not found in the base model

EDIT: solved by removing target_modules I tried to load Llama-2-7b-hf LLM with QLora with the following code: model_id = "meta-llama/Llama-2-7b-hf" tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True) # I have permissions. model…

python quantization lora large-language-model peft

asked Jul 21 '23 at 08:31

Ofir

590
9
19

1

vote

1 answer

Load ONNX model with low-bit quantized weight

I have quantized my model to 2-bit and packed them into uint8 format (store 4x 2-bit weight in an uint8 variable) in pytorch. In this way, the model size has been reduced from 1545M to 150M, and the VRAM for loading the model is also greatly reduced…

pytorch quantization onnxruntime

asked Jul 11 '23 at 08:07

Yefei He

11
1

1

vote

0 answers

8bit Quantization: Prediction outputs uncorrelated to underlying model

I quantized a basic TFLite regression model to int8 but the prediction output seems to be highly uncorrelated with the actual underlying model prior to quantizing it. All the code and steps taken to train and quantize the model are seen below to…

tensorflow data-science quantization tflite 8-bit

asked Jul 08 '23 at 23:30

Bemz

129
1
16

1

vote

0 answers

Is there a way to use f32 format for Math ops in TFLite while keeping other ops in quantized format during Keras model conversion?

Possible way to skip quantization for certain layers / Ops in tflite converter Hello Everyone, Is there a possible way to skip quantization for certain layers / Ops when converting a keras model to tflite model? specifically for Math.ops supported…

keras quantization tflite

asked May 24 '23 at 13:22

user3288779

11
1

1

vote

0 answers

Dequant layer in tflite model

I am running the training example for Quantization aware training as is in Google Colab. You can find the tutorial here. My exact code can be found here. I expected that all the layers in the output frozen model (.tflite) would be quantized…

machine-learning deep-learning quantization tflite quantization-aware-training

asked Jan 14 '23 at 14:55

PSW

11
1

1

vote

0 answers

jpeg python 8x8 window DCT and quantisation process

I am trying to build a simple jpeg compression process in python using but DCT and quantisation but not the Huffman coding. This is what I have done so far (compress and uncompress the same image): import cv2 as cv from scipy.fftpack import dct,…

python huffman-code quantization dct

asked Dec 13 '22 at 16:39

markman8

31
4

1

vote

0 answers

Pytorch: Quantize weights to unisgned 8 bit

I am trying to quantize the weights of the BERT model to unsigned 8 bits, and working with PyTorch. It works well with int8 datatype, but throws an error for unsigned int8 bits. Am using the 'dynamic_quantize' function for the same. quantized_model…

machine-learning pytorch bert-language-model quantization

asked Dec 10 '22 at 09:17

Rohan

11
2

1

vote

0 answers

Custom 8bit quantized inference in Tensorflow

I have simple fully connected model with following architecture: Input(10) → Dense(10) → Dense(10) → Dense(10) → Dense(10) → Dense(1) I performed integer 8 QAT on it, and tflite inference. I want to perform QAT with int8 but with some changes to…

tensorflow deep-learning pytorch quantization

asked Oct 25 '22 at 17:12

Stsh4lson

31
4

1

vote

0 answers

Does Pytorch PTQ (Post Training Quantization) actually use quantized integer weights for inference?

After Pytorch Post training quantization, I find that the forward propagation of the quantized model still seems to use dequantized float32 weights, rather than using quantized int8. Below I attached the PTQ example given on the Pytorch quantization…

python pytorch quantization

asked Aug 14 '22 at 21:10

Asher Mai

11
2

1

vote

0 answers

SnapML model issue on Lens Studio

I built a custom model for classifying images of cars using Tensorflow and Keras, to use it for building a Snap lens powered by machine learning. Lens Studio only accepts quantized models; the model had to go through the quantization process using…

tensorflow keras quantization snapchat tflite

asked Jun 29 '22 at 16:37

Centauri_42

33
3

1

vote

1 answer

Grid+ing and labeling the 3d Points

I have matrix of 3D points. I want to quantize them into spatial grid and late represent that matrix by their grid number. Let say the matrix is D D = np.array([[-45.08341177, 34.40457052, 7.63253164], [-46.81391587, 34.35034554, …

python 3d quantization

asked Jun 08 '22 at 07:10

The Peace of Programming

166
1
9

1

vote

0 answers

Is there a problem with DCT or quantization when encoding JPEG files?

When I make my own BMP to JPEG encoder, I found that the resulting JPEG image was very different from the original. After comparing RGB images to YUV images, I found that the two images were exactly the same. Instead of optimizing DCT…

jpeg quantization dct

asked Jun 02 '22 at 08:31

TIger_zh

11
2

1

vote

1 answer

How pytorch implement forward for a quantized linear layer?

I have a quantized model in pytorch and now I want to extract the parameter of the quantized linear layer and implement the forward manually. I search the source code but only find this function. def forward(self, x: torch.Tensor) -> torch.Tensor: …

pytorch quantization

asked May 03 '22 at 15:21

stephen

11
1

Questions tagged [quantization]