Questions tagged [large-language-model]

Use this tag for questions about large language models (LLM), trained deep-learning artificial intelligence algorithms that interpret and generate natural language text.

118 questions
1
vote
0 answers

How to restrict out of context search in LangChain

i want to restrict query search limited to custom documents for LLM . but its showing out of context results as well as shown in below image. My code is below: for token generation max_input_size = 4096 num_outputs = 512 max_chunk_overlap =…
1
vote
1 answer

Why does LLM(LLaMA) loss drop staircase-like over epochs?

I'm training a LLM(LLaMA-6B) and have noticed that its loss seems to drop in a stair-like fashion over the epochs. Specifically, I'll see little loss change for one epoch, and then suddenly the loss will drop quite a bit after a new epoch. I'm…
1
vote
1 answer

How does Huggingface's zero-shot classification work in production/webapp, do I need to train the model first?

I have already used huggingface's zero-shot classification: I used "facebook/bart-large-mnli" model as reported here (https://huggingface.co/tasks/zero-shot-classification). The accuracy is quite good for my task. My question is about…
1
vote
1 answer

How many neurons (units) are there in the BERT model?

How to estimate the number of neurons (units) in the BERT model? Note this is different from the number of model parameters.
1
vote
1 answer

Why did the bart-large-cnn summarization model giving funny output with different length settings?

I have a piece of text of 4226 characters (316 words + special characters) I am trying different combinations of min_length and max_length to get summary print(summarizer(INPUT, max_length = 1000, min_length=500, do_sample=False)) With the…
0
votes
0 answers

Cryptic CUDA error when fine-tuning a sequence classification model

I am working on fine-tuning Llama 2 7B for sequence classification using QLoRA. I am using a single A100 GPU and get the same cryptic CUDA error even when increasing to multiple GPUs, increasing CPU memory, and using a batch size of 1. This is the…
0
votes
1 answer

How to directly load fine-tuned model like Alpaca-Lora (PeftModel()) from the local files instead of load it from huggingface models?

I have finetuned Llama model using low-rank adaptation (LoRA), based on peft package. The result files adapter_config.json and adapter_model.bin are saved. I can load fine-tuned model from huggingface by using the following codes: model =…
a7777777
  • 1
  • 1
0
votes
1 answer

Getting Peft Version Error while Autotrain Finetune on Llama 2

i did some Llama 2 finetuning with autotrain, on google colab. this is a sample text column, for fine tuning ###Human: Here is the OCR Text extracted from a VHS tape cover. Yes, the text is surely extracted from a VHS tape, but it may have some…
SoajanII
  • 323
  • 5
  • 19
0
votes
0 answers

LLM token embeddings

Hi im just getting started with undertsanding transformer based models and I am not able to find how the token embeddings are arrived at?. there are multiple tokenization approaches and multiple vocabularies/documents llms are trained on. so my…
dasman
  • 237
  • 1
  • 2
  • 10
0
votes
0 answers

Questions about distributed finetuning of transformers model (chatglm) with Accelerate in Kaggle GPUs

I am trying to finetune the chatglm-6b model using LoRA with transformers and peft in Kaggle GPUs (2*T4). The model structure: The traditional loading method (AutoModel.from_pretrained) needs to load the model itself (15 GB) onto CPU first, whereas…
0
votes
0 answers

How to load the finetuned model (merged weights) on colab?

I have finetuned the llama2 model. Reloaded the base model and merged the LoRA weights. I again saved this finally loaded model and now I intend to run it. base_model = AutoModelForCausalLM.from_pretrained( model_name, …
Gaurav Gupta
  • 4,586
  • 4
  • 39
  • 72
0
votes
0 answers

Get the positive score in a classification task by using a generative model

I'm attempting to utilize a generative model (Llama2) for a binary classification task and aim to obtain the positive score, which represents the confidence level for the positive label. I tried to use compute_transition_scores but not sure how can…
Ofir
  • 590
  • 9
  • 19
0
votes
0 answers

Speed up llm in LangChain

My project is to do a search engine based natural language. I don't use eGPU nor M1/M2 Here is a part of my code import os from typing import Any, List # from llm import CustomLLM from langchain.chains import RetrievalQA from…
Faulheit
  • 116
  • 7
0
votes
1 answer

How to solve AssertionError when loading LLaMa 2 70B with Google Colab?

I am trying to run LLaMa 2 70B in Google Colab, using a GGML file: TheBloke/Llama-2-70B-Chat-GGML. Here is my current code that I am using to run it: !pip install huggingface_hub model_name_or_path = "TheBloke/Llama-2-70B-Chat-GGML" model_basename =…
0
votes
1 answer

Langchain: Custom Output Parser not working with ConversationChain

I am creating a chatbot with langchain's ConversationChain, thus, it needs conversation memory. However, at the end of each of its response, it makes a new line and writes a bunch of gibberish. Thus, I created my custom output parser to remove this…
Z S
  • 3
  • 1