Questions tagged [llama]

LLaMA (Large Language Model Meta AI) is a large language model (LLM) released by Meta AI.

LLaMA (Large Language Model Meta AI) is a large language model (LLM) released by Meta AI.

55 questions
0
votes
1 answer

Getting Peft Version Error while Autotrain Finetune on Llama 2

i did some Llama 2 finetuning with autotrain, on google colab. this is a sample text column, for fine tuning ###Human: Here is the OCR Text extracted from a VHS tape cover. Yes, the text is surely extracted from a VHS tape, but it may have some…
SoajanII
  • 323
  • 5
  • 19
0
votes
0 answers

Fine-tune llama2 on cuda:1

When I load the model I use device_map to use cuda:1 still it seems that the model and training are on different cores. How should I properly do this? Code running at Tesla T4 below: # load the base model in 4-bit quantization bnb_config =…
user1564762
  • 745
  • 2
  • 11
  • 18
0
votes
0 answers

how to improve my prompt while using meta-llama/Llama-2-13b-chat-hf

When I using meta-llama/Llama-2-13b-chat-hf the answer that model give is not good. I think is my prompt using wrong. below is my code from langchain.embeddings import HuggingFaceEmbeddings from langchain.text_splitter import…
0
votes
1 answer

Sagemaker AWS llama2 endpoint inference

I am calling the inference endpoint of jumpstart-llama2-foundational-model on AWS sagemaker but it gives me the error below: Error raised by inference endpoint: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received…
ddwivedy
  • 23
  • 1
  • 4
0
votes
1 answer

How do fix GPT2 Tokenizer error in Langchain map_reduce (LLama2)?

I'm using AWS Sagemaker Jumpstart model for Llama2 13b: meta-textgeneration-llama-2-13b-f On running a Langchain summarize chain with chain_type="map_reduce" I get the below error. I do not have access to https://huggingface.co from my environment.…
apprunner2186
  • 217
  • 1
  • 6
0
votes
0 answers

Why there is a "rope.freqs" variable in llama-2-7b weights?

I noticed a weight called “rope.freqs” in the weights of llama2 model(e.g. llama-2-7b or llama-2-7b-chat). What is the function of this weight, and which part of the model does it correspond to? IN [14]: checkpoint =…
hsc
  • 1,226
  • 11
  • 24
0
votes
0 answers

Running LLama2 on a GeForce 1080 8Gb machine

I am trying to run LLama2 on my server which has mentioned nvidia card. It's a simple hello world case you can find here. However I am constantly running into memory issues: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 250.00…
wonglik
  • 1,043
  • 4
  • 18
  • 36
0
votes
0 answers

How to load the finetuned model (merged weights) on colab?

I have finetuned the llama2 model. Reloaded the base model and merged the LoRA weights. I again saved this finally loaded model and now I intend to run it. base_model = AutoModelForCausalLM.from_pretrained( model_name, …
Gaurav Gupta
  • 4,586
  • 4
  • 39
  • 72
0
votes
0 answers

huggingface model: git clone Stuck at "Filtering content: 40% (2/5)" when cloning repository

huggingface model: git clone Stuck at "Filtering content: 40% (2/5)" when cloning repository When try to git clone https://huggingface.co/chavinlo/alpaca-native repository, the download gets stuck at status "Filtering content: 40% (2/5)" and no…
Bharat
  • 3
  • 3
0
votes
0 answers

How do i save checkpoints when using the Huggingface SFTTrainer?

Hey I’m trying to finetune Llama 2 and I can’t see where the checkpoints are getting saved. I am using the following code: output_dir = "./Llama-2-7b-hf-qlora" training_args = TrainingArguments( output_dir=output_dir, …
johnny
  • 51
  • 4
0
votes
0 answers

Do I need GPU to run LLama2 example python codes after I download the model from meta?

I downloade the LLama2 model from meta, and I can't run it having the impression that I need GPU. I don't have GPU currently and the example is not working. Bard insist to me that I don't need GPU, also I read that I can download a .cpp app that…
0
votes
0 answers

Chat with spreadsheet using meta-llama/Llama-2-13b-chat-hf

I made a spreadsheet which contain around 2000 question-answer pair and use meta-llama/Llama-2-13b-chat-hf model. But when start querying through the spreadsheet using the above model it gives wrong answers most of the time & also repeat it many…
0
votes
0 answers

running into cuda out of memory when running llama2-13b-chat model on multi-gpu machine

I'm trying to run llama2 13b model with rope scaling on the AWS g4dn.12xlarge machine with has 4 gpus with 16 GB VRAM each but getting cuda out of memory error. Code: from transformers import AutoModelForCausalLM, AutoTokenizer,pipeline import…
srk
  • 1
  • 1
0
votes
1 answer

Onnx Adapter Transformer Support

Does a model saved to onnx/ tensorRT format format supports adapter models plug just like we can do it in PyTorch or Tensorflow? If there is any proper tutorial or someone can provide details of it, would be appreciated. I have tried to find in…
Sajal
  • 96
  • 1
  • 5
0
votes
0 answers

How to use decapoda-research / llama-7b-hf with fine tuning LoRA in LLaMA.cpp?

Currently after fine tune model decapoda-research / llama-7b-hf with tool https://github.com/zetavg/LLaMA-LoRA-Tuner. Now I try to use it in LLaMA.cpp with tutorial: https://github.com/ggerganov/llama.cpp/discussions/1166 As far as I know, I need…
Khoi V
  • 612
  • 8
  • 13