Questions tagged [language-model]
266 questions
3
votes
2 answers
Need to understand the output format of kenlm querying
kenlm paper seems good for LM. I feel that minimal documentation is given, felt difficulty in understanding.
So, as part of understanding kenlm, I need to understand the output format of querying the model. Please do provide some detail on it.
I…

Venkatarao N
- 245
- 3
- 14
2
votes
1 answer
Fine-tuning a pre-trained LLM for question-answering
Objective
My goal is to fine-tune a pre-trained LLM on a dataset about Manchester United's (MU's) 2021/22 season (they had a poor season). I want to be able to prompt the fine-tuned model with questions such as "How can MU improve?", or "What are…

Tom Bomer
- 83
- 7
2
votes
0 answers
forward() got an unexpected keyword argument 'labels'
I am trying to use fine-tune TransformerXL for language modeling.
from transformers import TransfoXLTokenizer, TransfoXLModel
tokenizer = TransfoXLTokenizer.from_pretrained("transfo-xl-wt103")
model =…

elenata24
- 21
- 2
2
votes
1 answer
Forcing transformer models to generate only some tokens from a vocab
I trained a language model (encoder-decoder) to generate text. I want to restrict the generation vocab of this model to a specific vocab. How can I do that?
I found in generate (model.generate) function that I can pass a parameter called…

Minions
- 5,104
- 5
- 50
- 91
2
votes
1 answer
How to use arpa file in voice recognition
I have created a ARPA file from a text file using CMU SLM toolkit.
Currently I don't know how to use the generated ARPA file in my project instead of .lm and .dic file.
If any one knows about that please let me know.

ravoorinandan
- 783
- 2
- 10
- 26
2
votes
1 answer
Word embeddings with Google's T5?
Is it possible to generate word embeddings with Google's T5?
I'm assuming that this is possible. However, I cannot find the code I would need to be able to generate word embeddings on the relevant Github…

mcr
- 43
- 3
2
votes
1 answer
How bert is a bidirectional?
Bert encoder takes the input and goes for the multi-head attention model. But how do they maintain sequence? Since current words don't take sequence of previous words. Besides, why is it bidirectional? Does it maintain forward and backward sequence…

kowser66
- 125
- 1
- 8
2
votes
1 answer
Bert with Padding and Masked Token Predicton
I am Playing around with Bert Pretrained Models (bert-large-uncased-whole-word-masking)
I used Huggingface to try it I first Used this Piece of Code
m = TFBertLMHeadModel.from_pretrained("bert-large-cased-whole-word-masking")
logits =…

Jeyadevan Rajan
- 23
- 5
2
votes
1 answer
pip install a spacy language model in a particular folder
I would like to pip install several language models in a particular folder different than the default one.
How to proceed?
The following does not seem to work:
pip install…

JFerro
- 3,203
- 7
- 35
- 88
2
votes
1 answer
How is transformers loss calculated for blank token predictions?
I'm currently trying to implement a transformer and have trouble understanding its loss calculation.
My encoders input looks for batch_size=1 and max_sentence_length=8 like:
[[Das, Wetter, ist, gut, , , , ]]
My decoders…

dunky11
- 79
- 6
2
votes
2 answers
Modifying the Learning Rate in the middle of the Model Training in Deep Learning
Below is the code to configure TrainingArguments consumed from the HuggingFace transformers library to finetune the GPT2 language model.
training_args = TrainingArguments(
output_dir="./gpt2-language-model", #The output directory
…

Woody
- 930
- 9
- 23
2
votes
0 answers
How to train a keras tokenizer on a large corpus that doesn't fit in memory?
I am trying to train a language model that based on a 2-word input tries to predict a 1-word output. This is the model definition (all the layers are imported from keras.layers):
model = Sequential()
model.add(Embedding(vocab_size, 2,…

Cavarica2
- 25
- 6
2
votes
1 answer
Feed Forward Neural Network Language Model
I am currently in the process of trying to develop a feed-forward neural network n-gram language model using TensorFlow 2.0. Just to be clear, I do not want this to be implemented via a recurrent neural network, I simply want to use a few Dense…

Minura Punchihewa
- 1,498
- 1
- 12
- 35
2
votes
0 answers
Explicit likelihood of WordPiece used for pre-processing of BERT
At each iteration the WordPiece algorithm for subword tokenization merges the two symbols which increase the likelihood the most. Now, in the literature it is only mentioned that this likelihood is the likelihood of the language model (e.g., the…

SweetSpot
- 101
- 2
2
votes
1 answer
BERT + custom layer training performance going down with epochs
I'm training a classification model with custom layers on top of BERT. During this, the training performance of this model is going down with increasing epochs ( after the first epoch ) .. I'm not sure what to fix here - is it the model or the…

user3741951
- 189
- 1
- 11