Questions tagged [language-model]

266 questions
5
votes
3 answers

Is positional encoding necessary for transformer in language modeling?

I am developing a language model like https://pytorch.org/tutorials/beginner/transformer_tutorial.html. It is not clear for me - whether positional encoding is neccessary here ? As far as I understand - it is necessary for language translation task…
Andrey
  • 5,932
  • 3
  • 17
  • 35
5
votes
1 answer

Differences between en_vectors_web_lg and Glove vectors (spaCy)

https://spacy.io/models/en#en_vectors_web_lg stated that the model contains 1.1m keys, but https://nlp.stanford.edu/projects/glove/ stated that the Glove vectors contain 2.2M vocabs May I know what vocabs are missing? Thank you very much.
hi bye
  • 89
  • 1
  • 5
5
votes
1 answer

Understanding Character Level Embedding in Keras LSTM

I am a newbie in implementation of language models in Keras RNN structures. I have a dataset of discrete words (not from a single paragraph) that have the following statistics, Total word samples: 1953 Total number of Distinct Characters: 33…
5
votes
4 answers

How to compute perplexity using KenLM?

Let's say we build a model on this: $ wget https://gist.githubusercontent.com/alvations/1c1b388456dc3760ffb487ce950712ac/raw/86cdf7de279a2b9bceeb3adb481e42691d12fbba/something.txt $ lmplz -o 5 < something.txt > something.arpa From the perplexity…
alvas
  • 115,346
  • 109
  • 446
  • 738
5
votes
0 answers

Predicting a probability of a sentence using tensorflow

I am using this pre-trained model of tensorflow and trying to get a probability of a sentence. My primary task is, out of several sentences find a sentence with the largest probability. I am able to predict next words, using this…
5
votes
1 answer

Train TensorFlow language model with NCE or sampled softmax

I'm adapting the TensorFlow RNN tutorial to train a language model with a NCE loss or sampled softmax, but I still want to report perplexities. However, the perplexities I get are very weird: for NCE I get several millions (terrible!) whereas for…
niefpaarschoenen
  • 560
  • 1
  • 8
  • 19
5
votes
1 answer

How to tune a Machine Translation model with huge language model?

Moses is a software to build machine translation models. And KenLM is the defacto language model software that moses uses. I have a textfile with 16GB of text and i use it to build a language model as such: bin/lmplz -o 5 text.arpa The…
alvas
  • 115,346
  • 109
  • 446
  • 738
4
votes
1 answer

Difference between Instruction Tuning vs Non Instruction Tuning Large Language Models

What is the difference between instruction tuning and normal fine-tuning for large language models? Also the instruction-tuning I'm referring to isn't the in-context/prompt one. All the recent papers about fine-tuning seem to be about instruction…
Flo
  • 51
  • 1
  • 4
4
votes
0 answers

Keras Lstm predicting next item, taking whole sequences or sliding window. Will sliding window need stateful LSTM?

I have a sequence prediction problem in which, given the last n items in a sequence I need to predict next item. I have more than 2 million sequences each with different timesteps (length of sequence), like some are just 5 and some are…
A.B
  • 20,110
  • 3
  • 37
  • 71
4
votes
2 answers

When using padding in sequence models, is Keras validation accuracy valid/ reliable?

I have a group of non zero sequences with different lengths and I am using Keras LSTM to model these sequences. I use Keras Tokenizer to tokenize (tokens start from 1). In order to make sequences have the same lengths, I use padding. An example of…
4
votes
0 answers

squad2.0 training error: THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected

!python -m torch.distributed.launch --nproc_per_node=8 /root/examples/run_squad.py \ --model_type bert \ --model_name_or_path bert-large-uncased-whole-word-masking \ --do_train \ --do_eval \ --do_lower_case \ --train_file…
TIGUZI
  • 231
  • 1
  • 3
  • 12
4
votes
1 answer

Difference between spaCy models sm, md, lg

I can see that in the English spaCy models the medium model performs better than the small one, and the large model outperforms the medium one - but only marginally. However, in the description of the models, it is written that they have all been…
Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239
4
votes
0 answers

Alternative to one-hot encoding for output to a model when vocabulary size is very large

I was following this blog. In it he talks about how to build a language model in keras. He shows how to build a simple model in keras. After separating, we need to one hot encode the output word. This means converting it from an integer to a vector…
humble
  • 2,016
  • 4
  • 27
  • 36
4
votes
1 answer

How to relate the language model score of a whole sentence to those of the sentence's constituents

I trained a KENLM language model on around 5000 English sentences/paragraphs. I want to query this ARPA model with two or more segments and see if they can be concatenated to form a longer sentence, hopefully more "grammatical." Here as follows is…
Wei JIANG
  • 71
  • 4
4
votes
1 answer

Extract word/sentence probabilities from lm_1b trained model

I have successfully downloaded the 1B word language model trained using a CNN-LSTM (https://github.com/tensorflow/models/tree/master/research/lm_1b), and I would like to be able to input sentences or partial sentences to get the probability of each…
Matt
  • 53
  • 4
1
2
3
17 18