Questions tagged [language-model]
266 questions
1
vote
1 answer
Fine tuning of Bert word embeddings
I would like to load a pre-trained Bert model and to fine-tune it and particularly the word embeddings of the model using a custom dataset.
The task is to use the word embeddings of chosen words for further analysis.
It is important to mention that…

Aviade
- 2,057
- 4
- 27
- 49
1
vote
0 answers
Obtaining the Probability of a Sentence using a Language Model
I have trained a language model using the following architecture,
model = tf.keras.Sequential([
tf.keras.layers.Embedding(total_words, 300, weights=[embeddings_matrix], input_length=inputs.shape[1],…

Minura Punchihewa
- 1,498
- 1
- 12
- 35
1
vote
0 answers
RNN Language Model in PyTorch predicting the same three words repeatedly
I am attempting to create a word-level language model using an RNN in PyTorch. Whenever I am training the loss stays about the same for the whole training set and when I try to sample a new sentence the same three words are predicted in the same…

Ethan Baruh
- 11
- 2
1
vote
2 answers
Scripts missing for GPT-2 fine tune, and inference in Hugging-face GitHub?
I am following the documentation on the hugging face website, in there they say that to fine-tune GPT-2 I should use the script
run_lm_finetuning.py for fine-tuning, and the script run_generation.py
for inference.
However, both scripts don't…

raff7
- 13
- 4
1
vote
1 answer
How to use HuggingFace nlp library's GLUE for CoLA
I've been trying to use the HuggingFace nlp library's GLUE metric to check whether a given sentence is a grammatical English sentence. But I'm getting an error and is stuck without being able to proceed.
What I've tried so far;
reference and…

Dilrukshi Perera
- 917
- 3
- 17
- 31
1
vote
1 answer
Fine tuning a pretrained language model with Simple Transformers
In his article 'Language Model Fine-Tuning For Pre-Trained Transformers' Thilina Rajapakse (https://medium.com/skilai/language-model-fine-tuning-for-pre-trained-transformers-b7262774a7ee)
provides the following code snippet for fine-tuning a…

user8270077
- 4,621
- 17
- 75
- 140
1
vote
1 answer
How does masked_lm_labels argument work in BertForMaskedLM?
from transformers import BertTokenizer, BertForMaskedLM
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
input_ids = torch.tensor(tokenizer.encode("Hello, my…

Ahmad Beltagy
- 11
- 2
1
vote
1 answer
Define and Use new smoothing method in nltk language models
I'm trying to provide and test new smoothing method for language models. I'm using nltk tools and don't want to redefine everything from scratch. So is there any way to define and use my own smoothing method in nltk models?
Edit:
I'm trying to do…

Behzad Shayegh
- 323
- 1
- 10
1
vote
0 answers
Trainable USE-lite-based classifier with SentencePiece input
I have heard that it is possible to use the pretrained Universal Sentence Encoder (USE) (neural language model) from TF-hub as part of a trainable model, e.g. a sentence classifier. Some versions of USE rely on SentencePiece sub-word tokenizer,…

tpacker
- 11
- 3
1
vote
1 answer
GPT-2 language model: multiplying decoder-transformer output with token embedding or another weight matrix
I was reading the code of GPT2 language model. The transformation of hidden states to the probability distribution over the vocabulary has done in the following line:
lm_logits = self.lm_head(hidden_states)
Here,
self.lm_head =…

user3363813
- 567
- 1
- 5
- 19
1
vote
1 answer
while running huggingface gpt2-xl model embedding index getting out of range
I am trying to run hugginface gpt2-xl model. I ran code from the quickstart page that load the small gpt2 model and generate text by the following code:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch
tokenizer =…

user3363813
- 567
- 1
- 5
- 19
1
vote
0 answers
Word2vec: what does it mean that the projection layer is shared?
The slides of my professor compare the "Neural Net Language Model" (Bengio et al., 2003) with Google's word2vec (Mikolov et al., 2013). It says that, differently from the Bengio's model, in word2vec "the projection layer is shared (not just the…

robertspierre
- 3,218
- 2
- 31
- 46
1
vote
0 answers
Next word Prediction RNN
this is my second post. I am really sorry If I sound awkward. I am new to Machine Learning. Reporting the question or giving a negative point will help me. Again I am sorry for unable to clear my question.
Now coming to my question, I am working on…

srikant kumar
- 11
- 2
1
vote
1 answer
How to deal with large vocab_size when training a Language Model in Keras?
I want to train a language model in Keras, by this tutorial:
https://machinelearningmastery.com/develop-word-based-neural-language-models-python-keras/
My input is composed of:
lines num: 4823744
maximum line: 20
Vocabulary Size: 790609
Total…

jonb
- 845
- 1
- 13
- 36
1
vote
1 answer
How does Ulmfit's language model work when applied on a text classification problem?
I have been playing around with Ulmfit a lot lately and still cannot wrap my head around how the language model’s ability to make sound predictions about the next word affects the classification of texts. I guess my real problem is that I do not…

BigHead
- 53
- 3