Questions tagged [language-model]
266 questions
1
vote
0 answers
Arguments of OpenIE for extracting fewer event triples
I'm new to NLP and I'm trying to using OpenIE to extract event triples from texts.
I looked into its documents but quite don't understand its arguments. For example, max_entailments_per_clause controls the maximum number of entailments to produce…

R__
- 87
- 5
1
vote
0 answers
How to force GPT2 to generate specific tokens in each sentence?
My input is a string and the outputs are vector representations (corresponding to the generated tokens). I'm trying to force the outputs to have specific tokens (e.g., 4 commas/2 of the word "to", etc). That is, each generated sentence must have…

Penguin
- 1,923
- 3
- 21
- 51
1
vote
1 answer
How to not break differentiability with a model's output?
I have an autoregressive language model in Pytorch that generates text, which is a collection of sentences, given one input:
output_text = ["sentence_1. sentence_2. sentence_3. sentence_4."]
Note that the output of the language model is in the form…

Penguin
- 1,923
- 3
- 21
- 51
1
vote
2 answers
Keras model with fasttext word embedding
I am trying to learn a language model to predict the last word of a sentence given all the previous words using keras. I would like to embed my inputs using a learned fasttext embedding model.
I managed to preprocess my text data and embed the using…

tristan19954
- 103
- 9
1
vote
1 answer
N-gram Language Model returns nothing
I am following the tutorial here: https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-language-model-nlp-python-code/#h2_5 to create a Language model. I am following the bit about the N-gram Language model.
This is the completed…

Helana Brock
- 45
- 15
1
vote
1 answer
Why is my Transformer implementation losing to a BiLSTM?
I am dealing with a sequence tagging problem and I am using a single Transformer Encoder to obtain logits from each element of the sequence. Having experimented both with Transformer and BiLSTM it looks like in my case BiLSTM is working better, so I…

Iacopo Ghinassi
- 13
- 2
1
vote
1 answer
Spacy download en_core_web_lg manually
I am trying to find a way to download the model en_core_web_lg ==2.3.1 for Spacy == 2.3.2.
Currently using
python -m spacy download en_core_web_lg
import spacy
nlp = spacy.load ("en_core_web_lg")
Is it possible to download the model file or…

data_person
- 4,194
- 7
- 40
- 75
1
vote
2 answers
RuntimeError: CUDA error: device-side assert triggered - BART model
I am trying to run BART language model for a text generation task.
My code was working fine when I used for another encoder-decoder model (T5), but with bart I am getting this error:
File "train_bart.py", line 89, in train
outputs =…

Minions
- 5,104
- 5
- 50
- 91
1
vote
0 answers
fill-mask usage from transformers pipeline
I fine-tune a gpt2 language model and I am generation the text according to my model by using following lines of code:
generator = pipeline('text-generation', tokenizer='gpt2', model='data/out')
print(generator('Once upon a time',…

Naqi
- 135
- 12
1
vote
0 answers
Huggingface GPT transformers layers output
I'm trying to use a GPT language model and get the weights it assigns to each word in the last state of text generation. My model is a GPT2 from the transformers library. Below is how I call the pretrained model:
tokenizer =…

mitra mirshafiee
- 393
- 6
- 17
1
vote
1 answer
BERT: Weights of input embeddings as part of the Masked Language Model
I looked through different implementations of BERT's Masked Language Model.
For pre-training there are two common versions:
Decoder would simply take the final embedding of the [MASK]ed token and pass it throught a linear layer (without any…

Germans Savcisens
- 158
- 12
1
vote
0 answers
Running into troubles with data path (os.path.isdir(path) returning false when it exists) using FB XLM
I'm trying to run an evaluation on FB's Transcoder (https://github.com/facebookresearch/TransCoder) which implements FB's XLM (cross language model): https://github.com/facebookresearch/XLM#iii-applications-supervised--unsupervised-mt
I have…

skidjoe
- 493
- 7
- 16
1
vote
1 answer
Solve Speed Difference in ktrain Predictor vs. Learner prediction?
I am using ktrain huggingface library to build a language model. When implementing it for production, I noticed, there is a huge difference in speed for a "learner prediction" vs. a "predictor prediction".
How come and is there any way to speed up…

Jasper Schwenzow
- 21
- 4
1
vote
1 answer
HuggingFace - GPT2 Tokenizer configuration in config.json
The GPT2 finetuned model is uploaded in huggingface-models for the inferencing
Below error is observed during the inference,
Can't load tokenizer using from_pretrained, please update its configuration: Can't load tokenizer for…

Woody
- 930
- 9
- 23
1
vote
1 answer
Best approach for semantic similarity in large documents using BERT or LSTM models
I am trying to build a search application for resumes which are in .pdf format. For a given search query like "who is proficient in Java and worked in an MNC", the output should be the CV which is most similar. My plan is to read pdf text and find…

Noxious Reptile
- 838
- 1
- 7
- 24