Use this tag for questions about large language models (LLM), trained deep-learning artificial intelligence algorithms that interpret and generate natural language text.
Questions tagged [large-language-model]
118 questions
1
vote
0 answers
How to restrict out of context search in LangChain
i want to restrict query search limited to custom documents for LLM . but its showing out of context results as well as shown in below image.
My code is below:
for token generation
max_input_size = 4096
num_outputs = 512
max_chunk_overlap =…

Shubh kumar
- 39
- 5
1
vote
1 answer
Why does LLM(LLaMA) loss drop staircase-like over epochs?
I'm training a LLM(LLaMA-6B) and have noticed that its loss seems to drop in a stair-like fashion over the epochs. Specifically, I'll see little loss change for one epoch, and then suddenly the loss will drop quite a bit after a new epoch.
I'm…

Jing zhao
- 11
- 1
1
vote
1 answer
How does Huggingface's zero-shot classification work in production/webapp, do I need to train the model first?
I have already used huggingface's zero-shot classification: I used "facebook/bart-large-mnli" model as reported here (https://huggingface.co/tasks/zero-shot-classification). The accuracy is quite good for my task.
My question is about…

Franz
- 45
- 1
- 7
1
vote
1 answer
How many neurons (units) are there in the BERT model?
How to estimate the number of neurons (units) in the BERT model?
Note this is different from the number of model parameters.

Celso França
- 653
- 8
- 31
1
vote
1 answer
Why did the bart-large-cnn summarization model giving funny output with different length settings?
I have a piece of text of 4226 characters (316 words + special characters)
I am trying different combinations of min_length and max_length to get summary
print(summarizer(INPUT, max_length = 1000, min_length=500, do_sample=False))
With the…

Ani
- 265
- 1
- 3
- 10
0
votes
0 answers
Cryptic CUDA error when fine-tuning a sequence classification model
I am working on fine-tuning Llama 2 7B for sequence classification using QLoRA. I am using a single A100 GPU and get the same cryptic CUDA error even when increasing to multiple GPUs, increasing CPU memory, and using a batch size of 1.
This is the…
0
votes
1 answer
How to directly load fine-tuned model like Alpaca-Lora (PeftModel()) from the local files instead of load it from huggingface models?
I have finetuned Llama model using low-rank adaptation (LoRA), based on peft package. The result files adapter_config.json and adapter_model.bin are saved.
I can load fine-tuned model from huggingface by using the following codes:
model =…

a7777777
- 1
- 1
0
votes
1 answer
Getting Peft Version Error while Autotrain Finetune on Llama 2
i did some Llama 2 finetuning with autotrain, on google colab. this is a sample text column, for fine tuning
###Human:
Here is the OCR Text extracted from a VHS tape cover. Yes, the text is surely extracted from a VHS tape, but it may have some…

SoajanII
- 323
- 5
- 19
0
votes
0 answers
LLM token embeddings
Hi im just getting started with undertsanding transformer based models and I am not able to find how the token embeddings are arrived at?.
there are multiple tokenization approaches and multiple vocabularies/documents llms are trained on. so my…

dasman
- 237
- 1
- 2
- 10
0
votes
0 answers
Questions about distributed finetuning of transformers model (chatglm) with Accelerate in Kaggle GPUs
I am trying to finetune the chatglm-6b model using LoRA with transformers and peft in Kaggle GPUs (2*T4). The model structure:
The traditional loading method (AutoModel.from_pretrained) needs to load the model itself (15 GB) onto CPU first, whereas…

LocustNymph
- 11
- 3
0
votes
0 answers
How to load the finetuned model (merged weights) on colab?
I have finetuned the llama2 model. Reloaded the base model and merged the LoRA weights. I again saved this finally loaded model and now I intend to run it.
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
…

Gaurav Gupta
- 4,586
- 4
- 39
- 72
0
votes
0 answers
Get the positive score in a classification task by using a generative model
I'm attempting to utilize a generative model (Llama2) for a binary classification task and aim to obtain the positive score, which represents the confidence level for the positive label.
I tried to use compute_transition_scores but not sure how can…

Ofir
- 590
- 9
- 19
0
votes
0 answers
Speed up llm in LangChain
My project is to do a search engine based natural language.
I don't use eGPU nor M1/M2
Here is a part of my code
import os
from typing import Any, List
# from llm import CustomLLM
from langchain.chains import RetrievalQA
from…

Faulheit
- 116
- 7
0
votes
1 answer
How to solve AssertionError when loading LLaMa 2 70B with Google Colab?
I am trying to run LLaMa 2 70B in Google Colab, using a GGML file: TheBloke/Llama-2-70B-Chat-GGML. Here is my current code that I am using to run it:
!pip install huggingface_hub
model_name_or_path = "TheBloke/Llama-2-70B-Chat-GGML"
model_basename =…

Hoang Cuong Nguyen
- 329
- 2
- 11
0
votes
1 answer
Langchain: Custom Output Parser not working with ConversationChain
I am creating a chatbot with langchain's ConversationChain, thus, it needs conversation memory. However, at the end of each of its response, it makes a new line and writes a bunch of gibberish. Thus, I created my custom output parser to remove this…

Z S
- 3
- 1