Cuda memory error while running haystack prompt node with gpu

Question

I am having Cuda ran out of memory issue while running this code:

prompt_node = PromptNode(model_name_or_path = 'google/flan-t5-xl',
default_prompt_template=lfqa_prompt,
use_gpu=True,
max_length=300)

I tried to solve the issue with Cuda. I am using GPU with the retriever, and it works fine. Only having the issue when I use the prompt node with GPU. Any suggestion on how to fix it?

The error is:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 14.85 GiB total capacity; 4.02 GiB already allocated; 17.44 MiB free; 4.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF.

flan-t5-xl has 3 billion parameters -> requires about 12 GB VRAM. If you already loaded something on the GPU (4GB already allocated), this big model won't fit. Possible strategies: 1) Increase your GPU 2) Free some GPU 3) model quantization https://huggingface.co/blog/hf-bitsandbytes-integration For more help, come to the Discord channel: https://discord.gg/haystack — Stefano Fiorucci - anakin87, Jun 21 '23 at 16:27
Can you explain how to do model quantization in the prompt node? — sherin_a27, Jun 22 '23 at 02:09
You can check out these notebooks (by Hugging Face), not specifically related to PromptNode: https://colab.research.google.com/drive/1YORPWx4okIHXnjW7MSAidXN29mPVNT7F?usp=sharing https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/HuggingFace_int8_demo.ipynb — Stefano Fiorucci - anakin87, Jun 22 '23 at 06:48

score 2 · Accepted Answer · answered Jun 22 '23 at 07:28

For the model you are using, 'google/flan-t5-xl', there are some smaller alternatives, such as 'google/flan-t5-small' or 'google/flan-t5-base'. They require less memory and that would be my suggestion here.

Quantization would be a different approach. Haystack doesn't support quantization out of the box yet but I believe it wouldn't be to difficult to add so maybe you can make a feature request through a GitHub issue?

In the particular error message you posted, it seems that not all of the GPU memory is used. For some reason it seems to be limited to 4GiB out of the 14.85 GiB. Could well be that it's not related to the model but a bug in torch or with the execution environment. Have you tried running it in a fresh environment? You might want to check whether your problem is similar to one of the following torch issues: https://github.com/pytorch/pytorch/issues/40002 or https://github.com/pytorch/pytorch/issues/67680

Cuda memory error while running haystack prompt node with gpu

1 Answers1