Highest Voted 'language-model' Questions

2

votes

1 answer

Accessing terms statistics in Lucene 4

I have a Lucene index, and I need to access some statistics such as term collection frequency. BasicStats class has this information, however, I could not understand whether this class is accessible. Is it possible to access BasicStats class in…

asked Jul 09 '15 at 20:02

Elad Kravi

43
5

2

votes

2 answers

What is the most efficient way of storing language models in NLP applications?

How do they usually store and update language models (such as N-gram models)? What kind of structure is the most efficient way for storing these models in databases?

nlp n-gram language-model

asked Apr 28 '15 at 09:49

user3017348

2

votes

1 answer

how to treat with and in calculating unigram LM?

I am beginner in NLP and I'm confused how to treat with ~~and~~ symbols to calculate counts for unigram model? should I count them or just ignore?

nlp language-model

asked Apr 22 '15 at 17:02

user3070752

694
4
23

1

vote

1 answer

Why do we add |V| in the denominator in the Add-One smoothing for n-gram language models?

In NLP when we use Laplace(Add-one) smoothing technique we assume that the every word is seen one more time than the actual count and the formula is like this where V is the size of the vocabulary. My question is why do we add V when we are only…

nlp smoothing language-model

asked Aug 11 '23 at 10:21

heyharshal

13
3

1

vote

1 answer

GPT4All Metal Library Conflict during Embedding on M1 Mac

I am trying to run GPT4All's embedding model on my M1 Macbook with the following code: import json import numpy as np from gpt4all import GPT4All, Embed4All # Load the cleaned JSON data with open('coursesclean.json') as file: data =…

metal langchain language-model gpt4all

asked Jul 17 '23 at 21:42

user20140267

23
2

1

vote

1 answer

OpenAI Fine-tunes API: Why would I use LlamaIndex or LangChain instead of fine-tuning a model?

I'm just getting started with working with LLMs, particularly OpenAIs and other OSS models. There are a lot of guides on using LlamaIndex to create a store of all your documents and then query on them. I tried it out with a few sample documents, but…

openai-api langchain chatgpt-api language-model llama-index

asked Jun 24 '23 at 12:12

Curunir The Colorful

33
5

1

vote

1 answer

How to structure data for question-answering task to fine-tune a model with Huggingface run_qa.py example?

import sagemaker import boto3 from sagemaker.huggingface import HuggingFace try: role = sagemaker.get_execution_role() except ValueError: iam = boto3.client('iam') role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn'] …

python nlp huggingface-transformers language-model nlp-question-answering

asked Jun 06 '23 at 16:30

Tom Bomer

83
7

1

vote

2 answers

How can I speed up a QA Langchain using load_qa_with_sources_chain?

I am currently running a QA model using load_qa_with_sources_chain(). However, when I run it with three chunks of each up to 10,000 tokens, it takes about 35s to return an answer. I would like to speed this up. Can somebody explain what influences…

python python-3.x language-model langchain py-langchain

asked May 18 '23 at 23:37

derlunter

11
1
3

1

vote

1 answer

Why is perplexity calculation giving different results for the same input?

I'm following Huggingface doc on calculating the perplexity of fixed-length models. I'm trying to verify that the formula works for various strings and I'm getting odd behavior. In particular, they mention We don’t want the log-likelihood for the…

pytorch nlp huggingface-transformers language-model perplexity

asked May 06 '23 at 02:41

Penguin

1,923
3
21
51

1

vote

0 answers

Not able to resolve TypeError: Transformer.forward() got an unexpected keyword argument 'labels'

I am trying to implement the chapter 10 of NLP with transformers by lewis tunstall book. I am facing an error in this particular cell : from transformers.optimization import get_scheduler from accelerate import Accelerator …

python machine-learning deep-learning nlp language-model

asked Apr 21 '23 at 05:10

Bhupinder singh

11
2

1

vote

1 answer

Langchain Chatbot with Memory + Vector Database

In Langchain, what is the suggested way to build a chatbot with memory and retrieval from a vector embedding database at the same time? The examples in the docs add memory modules to chains that do not have a vector database. Related issue.

nlp chatbot language-model langchain

asked Mar 31 '23 at 09:59

Rexcirus

2,459
3
22
42

1

vote

1 answer

Cannot allocate memory Failed to allocate when using KenLM build_binary

I have a arpa file which I created by the following command: ./lmplz -o 4 -S 1G 100m.arpa Now I want to convert this arpa file to binary file: ./build_binary 100m.arpa 100m.bin And I'm getting error: mmap.cc:225 in void…

c++ nlp n-gram language-model kenlm

asked Mar 20 '23 at 14:35

user3668129

4,318
6
45
87

1

vote

1 answer

When using OPT-2.7B or any other natural language model, is there a way to trick it into having a conversation/ give it a pre prompt in the code

Using this code, or a variant of, is there anything that can be added to "trick" opt into conversing as another user in a style more similar to a chatbot. As of now it will either start something more similar to an article or have a conversation…

neural-network huggingface-transformers language-model huggingface gpt-2

asked Dec 20 '22 at 21:30

Delta Adams

11
1

1

vote

0 answers

How to understand the bias term in language model head (when we tie the word embeddings)?

I was learning the masked language modeling codebase in Huggingface Transformers. Just a question to understand the language model head. Here at the final linear layer where we project hidden size to vocab size…

nlp huggingface-transformers bert-language-model language-model

asked Aug 29 '22 at 06:33

Allan-J

336
4
11

1

vote

0 answers

NAN values appears when including a new padding token in my tokenizer

I'm trying to fine-tune a DialoGPT model on a new dataset. I already processed my data correctly and adding a new padding token in the tokenizer didn't seem to make any issue : #my dataset :…

python deep-learning huggingface-transformers language-model gpt-2

asked Aug 12 '22 at 14:05

Tessan

49
1
9

Questions tagged [language-model]