Questions tagged [language-model]
266 questions
5
votes
3 answers
Is positional encoding necessary for transformer in language modeling?
I am developing a language model like https://pytorch.org/tutorials/beginner/transformer_tutorial.html.
It is not clear for me - whether positional encoding is neccessary here ?
As far as I understand - it is necessary for language translation task…

Andrey
- 5,932
- 3
- 17
- 35
5
votes
1 answer
Differences between en_vectors_web_lg and Glove vectors (spaCy)
https://spacy.io/models/en#en_vectors_web_lg
stated that the model contains 1.1m keys, but
https://nlp.stanford.edu/projects/glove/
stated that the Glove vectors contain 2.2M vocabs
May I know what vocabs are missing?
Thank you very much.

hi bye
- 89
- 1
- 5
5
votes
1 answer
Understanding Character Level Embedding in Keras LSTM
I am a newbie in implementation of language models in Keras RNN structures. I have a dataset of discrete words (not from a single paragraph) that have the following statistics,
Total word samples: 1953
Total number of Distinct Characters: 33…

Parthosarathi Mukherjee
- 385
- 2
- 13
5
votes
4 answers
How to compute perplexity using KenLM?
Let's say we build a model on this:
$ wget https://gist.githubusercontent.com/alvations/1c1b388456dc3760ffb487ce950712ac/raw/86cdf7de279a2b9bceeb3adb481e42691d12fbba/something.txt
$ lmplz -o 5 < something.txt > something.arpa
From the perplexity…

alvas
- 115,346
- 109
- 446
- 738
5
votes
0 answers
Predicting a probability of a sentence using tensorflow
I am using this pre-trained model of tensorflow and trying to get a probability of a sentence. My primary task is, out of several sentences find a sentence with the largest probability.
I am able to predict next words, using this…

Riken Shah
- 3,022
- 5
- 29
- 56
5
votes
1 answer
Train TensorFlow language model with NCE or sampled softmax
I'm adapting the TensorFlow RNN tutorial to train a language model with a NCE loss or sampled softmax, but I still want to report perplexities. However, the perplexities I get are very weird: for NCE I get several millions (terrible!) whereas for…

niefpaarschoenen
- 560
- 1
- 8
- 19
5
votes
1 answer
How to tune a Machine Translation model with huge language model?
Moses is a software to build machine translation models. And KenLM is the defacto language model software that moses uses.
I have a textfile with 16GB of text and i use it to build a language model as such:
bin/lmplz -o 5 text.arpa
The…

alvas
- 115,346
- 109
- 446
- 738
4
votes
1 answer
Difference between Instruction Tuning vs Non Instruction Tuning Large Language Models
What is the difference between instruction tuning and normal fine-tuning for large language models?
Also the instruction-tuning I'm referring to isn't the in-context/prompt one.
All the recent papers about fine-tuning seem to be about instruction…

Flo
- 51
- 1
- 4
4
votes
0 answers
Keras Lstm predicting next item, taking whole sequences or sliding window. Will sliding window need stateful LSTM?
I have a sequence prediction problem in which, given the last n items in a sequence I need to predict next item.
I have more than 2 million sequences each with different timesteps (length of sequence), like some are just 5 and some are…

A.B
- 20,110
- 3
- 37
- 71
4
votes
2 answers
When using padding in sequence models, is Keras validation accuracy valid/ reliable?
I have a group of non zero sequences with different lengths and I am using Keras LSTM to model these sequences. I use Keras Tokenizer to tokenize (tokens start from 1). In order to make sequences have the same lengths, I use padding.
An example of…

Amir Jalilifard
- 2,027
- 5
- 26
- 38
4
votes
0 answers
squad2.0 training error: THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected
!python -m torch.distributed.launch --nproc_per_node=8 /root/examples/run_squad.py \
--model_type bert \
--model_name_or_path bert-large-uncased-whole-word-masking \
--do_train \
--do_eval \
--do_lower_case \
--train_file…

TIGUZI
- 231
- 1
- 3
- 12
4
votes
1 answer
Difference between spaCy models sm, md, lg
I can see that in the English spaCy models the medium model performs better than the small one, and the large model outperforms the medium one - but only marginally. However, in the description of the models, it is written that they have all been…

Bram Vanroy
- 27,032
- 24
- 137
- 239
4
votes
0 answers
Alternative to one-hot encoding for output to a model when vocabulary size is very large
I was following this blog. In it he talks about how to build a language model in keras. He shows how to build a simple model in keras.
After separating, we need to one hot encode the output word. This means converting it from an integer to a vector…

humble
- 2,016
- 4
- 27
- 36
4
votes
1 answer
How to relate the language model score of a whole sentence to those of the sentence's constituents
I trained a KENLM language model on around 5000 English sentences/paragraphs. I want to query this ARPA model with two or more segments and see if they can be concatenated to form a longer sentence, hopefully more "grammatical." Here as follows is…

Wei JIANG
- 71
- 4
4
votes
1 answer
Extract word/sentence probabilities from lm_1b trained model
I have successfully downloaded the 1B word language model trained using a CNN-LSTM (https://github.com/tensorflow/models/tree/master/research/lm_1b), and I would like to be able to input sentences or partial sentences to get the probability of each…

Matt
- 53
- 4