Highest Voted 'language-model' Questions

1

vote

0 answers

Using theano to implement maximum likelihood learning in neural probability language model Python

I'm trying to implement maximum likelihood learning for neural probability language model in python from code of log-bilinear model: https://github.com/wenjieguan/Log-bilinear-language-models/blob/master/lbl.py I used grad function in theano to…

python language-model

asked Nov 27 '14 at 14:41

kidstar

41
3

0

votes

0 answers

Why do unmasked tokens of a sequence change when passed through a language model?

Why passing a sequence of tokens, say ["A", "B", "C", "D"] through a masked language model without any masking does not result in the same sequence being output when you select the highest probability tokens from the output model logits, i.e.,…

machine-learning language-model

asked Aug 23 '23 at 19:13

Anshul

61
1
8

0

votes

0 answers

How to vectorize text data in Pandas.DataFrame and then one_hot encoode it "inside" the model

I try to implement sequence model (trained to predict next word) built on one-hot encoded vector sequences. My custom one-hot encoder works well. But just as exercise I want to do all things with tensorflow (inspired by Deep Learning with Python,…

tensorflow nlp one-hot-encoding language-model

asked Aug 03 '23 at 11:15

x3mEr

23
6

0

votes

0 answers

With a HuggingFace trainer, how do I show the training loss versus the eval data set?

I'm running: #original training script trainer = transformers.Trainer( model=model, train_dataset=train_dataset, eval_dataset=test_dataset, #turn on the eval dataset for comparisons args=transformers.TrainingArguments( …

language-model huggingface-trainer

asked Jul 25 '23 at 12:10

Ronan McGovern

31
3

0

votes

0 answers

How to train KenLM language model for Nvidia's QuartzNet?

I am trying to train a speech-to-text model for the Armenian language. After I am using the Nvidia NeMo toolkit. After training the acoustic model I used provided NeMo/scripts/asr_language_modeling/ngram_lm/train_kenlm.py file to train the language…

speech-recognition nvidia language-model ctc kenlm

asked Jul 24 '23 at 03:33

arm

56
1
12

0

votes

0 answers

Python-based way to extract text from scientific/academic paper for a language model

I am looking for a method to extract only the core text of a scientific paper. The paper is structured in paragraphs and I only want to cover the text without any mail-adress, websites, tables or pictures. My purpose is to create a clean txt file…

python pdf text-extraction language-model

asked Jul 13 '23 at 07:58

Enes Kayacan

1
1

0

votes

1 answer

How to get the embedding of any vocabulary token in GPT?

I have a GPT model model = BioGptForCausalLM.from_pretrained("microsoft/biogpt").to(device) When I send my batch to it I can get the logits and the hidden states: out = model(batch["input_ids"].to(device), output_hidden_states=True,…

machine-learning pytorch nlp huggingface-transformers language-model

asked Jul 12 '23 at 14:09

Penguin

1,923
3
21
51

0

votes

1 answer

How to get the vector embedding of a token in GPT?

I have a GPT model model = BioGptForCausalLM.from_pretrained("microsoft/biogpt").to(device) When I send my batch to it I can get the logits and the hidden states: out = model(batch["input_ids"].to(device), output_hidden_states=True,…

machine-learning pytorch huggingface-transformers language-model

asked Jul 10 '23 at 16:06

Penguin

1,923
3
21
51

0

votes

0 answers

How to use a biomedical model from Huggingface to get text embeddings?

I have biomedical text that I'm trying to get the embeddings for using a biomedical transformer: my_text = ["Chocolate has a history of human consumption tracing back to 400 AD and is rich in polyphenols such as catechins, anthocyanidins, and pro…

machine-learning pytorch word-embedding huggingface language-model

asked Jul 01 '23 at 17:51

Penguin

1,923
3
21
51

0

votes

0 answers

How to train a language model in Huggingface with a custom loss?

I'm following Huggingface's tutorial on training a causal language model. I want to modify it such that instead of just predicting the next token, the model is also predicting a vector after some tokens corresponding to the sentiment. So for…

machine-learning pytorch huggingface-transformers huggingface language-model

asked Jun 28 '23 at 15:11

Penguin

1,923
3
21
51

0

votes

1 answer

Error while installing lmql[hf] using pip: "No matching distribution found for lmql[hf]

I am trying to install lmql[hf] using the pip package manager in order to set up a local LMQL playground. Following the documentation, I ran the command pip install lmql[hf]. However, I encountered the following error: ERROR: Ignored the following…

python pip bert-language-model language-model large-language-model

asked Jun 25 '23 at 23:25

Pavel

1
2

0

votes

1 answer

ArrowInvalid: Column 4 named input_ids expected length 1000 but got length 328

# Formatting block_size = 128 # or any number suitable to your context def group_texts(examples): # Concatenate all 'input_ids' concatenated_examples = sum(examples["input_ids"], []) total_length = len(concatenated_examples) #…

python machine-learning language-model

asked Jun 19 '23 at 19:27

Nischal

1

0

votes

0 answers

How do I do vector embedding of words using Ruby, without making calls to a third party API?

How do I make vector embeddings of words using Ruby, without making calls to a third party API? Just want to do it locally for speed and cost. I can't find any good examples in Ruby.

ruby language-model

asked May 25 '23 at 17:04

Some Guy

12,768
22
58
86

0

votes

0 answers

How to compute a simple maximum likelihood LM with SRILM

I want to use build a simple maximum likelihood (i.e. p(w|w_history) = c(w_history, w)/c(w_history), nothing else) language model without any tricks like smoothing. I am using a small corpus on purpose, to check that the computed numbers match with…

nlp n-gram language-model srilm

asked May 07 '23 at 12:46

peer

4,171
8
42
73

0

votes

1 answer

How to denoise text using T5?

I'm trying to denoise text using a T5 model following the Huggingface doc: from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained("t5-small") model =…

pytorch nlp huggingface-transformers huggingface language-model

asked May 05 '23 at 21:24

Penguin

1,923
3
21
51

Questions tagged [language-model]