Questions tagged [language-model]
266 questions
1
vote
2 answers
Get the probability distribution of next word given a sequence using TensorFlow's RNN (LSTM) language model?
I'm running TensorFlow's RNN (LSTM) language model example here.
It runs and reports the perplexities perfectly.
What I want though is three things:
Given a sequence (e.g. w1 w5 w2000 w750) give me the probability distribution for the next word…

Ash
- 3,428
- 1
- 34
- 44
1
vote
1 answer
How to learn two sequences simultaenously through LSTM in Tensorflow/TFLearn?
I am learning LSTM based seq2seq model in Tensorflow platform. I can very well train a model on a given simple seq2seq examples.
However, in cases where I have to learn two sequences at once from a given sequence (for e.g: learning previous…

user3480922
- 564
- 1
- 10
- 22
1
vote
1 answer
TensorFlow reset state during batch = sentence-level language model
What is the best way to build a recurrent language model (e.g. LSTM) that does not cross sentence boundaries? Or put more general, if you present a batch to the model, each row containing multiple sentences, how can you reset the state after seeing…

niefpaarschoenen
- 560
- 1
- 8
- 19
1
vote
2 answers
Dynamic LSTM model in Tensorflow
I am looking to design a LSTM model using Tensorflow, wherein the sentences are of different length. I came across a tutorial on PTB dataset (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/rnn/ptb/ptb_word_lm.py). How does…

user3480922
- 564
- 1
- 10
- 22
1
vote
0 answers
What is a simple example of a TensorFlow file pipeline for a language model?
I am building a RNN language model in TensorFlow. My raw input consists of files of text. I am able to tokenize them, so that data I am working with is sequences of integers that are indexes into a vocabulary.
Following the example in…

W.P. McNeill
- 16,336
- 12
- 75
- 111
1
vote
0 answers
Using language model tool without any installation
I know that there are some language model tools which are IRSLM, MITLM, SRILM . All of them need to a installation to be able to create a language model etc.
However I need a language model tool which is not needed any installation and can be used…

ziLk
- 3,120
- 21
- 45
1
vote
1 answer
language model with SRILM
I'm trying to build a language model using SRILM.
I have a list of phrases and I create the model using:
./ngram-count -text corpus.txt -order 3 -ukndiscount -interpolate -unk -lm corpus.lm
After this I tried to make some example to see the…

Daniele
- 821
- 7
- 18
1
vote
1 answer
Wrong number of dimensions: expected 0, got 1 with shape (1,)
I am doing word-level language modelling with a vanilla rnn, I am able to train the model but for some weird reasons I am not able to get any samples/predictions from the model; here is the relevant part of the code:
train_set_x, train_set_y, voc =…

uyaseen
- 1,189
- 3
- 16
- 34
1
vote
2 answers
nltk language model TypeError:ngarms() got an unexpected keyword argument 'pad_symbol'
I'm executing the following code:
from nltk.corpus import brown
from nltk.model import Ngram
lm = NgramModel(2, brown.words(categories='news'), estimator=None)
But I got an error:
I really don't know why I do have this problem; is it a bug from…

Am1rr3zA
- 7,115
- 18
- 83
- 125
1
vote
1 answer
Correct parameters for wngram2idngram?
I am trying to generate the arpa format language model with the following commands:
text2wngram < weather.txt | grep -v " " > weather.wngram
wngram2idngram -vocab weather.vocab < weather.wngram > weather.idngram
idngram2lm -vocab_type 0…

g10dras
- 399
- 2
- 11
1
vote
1 answer
CMU Sphinx4 - Custom Language Model
I have a very specific requirement. I am working on an application which will allow users to speak their employee number which is of the format HN56C12345 (any alphanumeric characters sequence) into the app. I have gone through the link:…

Qedrix
- 453
- 1
- 8
- 15
1
vote
1 answer
Why is my Sphinx4 Recognition poor?
I am learning how to use Sphinx4 using the Maven plug-in for Eclipse.
I took the transcribe demo found on GitHub and altered it to process a file of my own. The audio file is 16bit, mono, 16khz. It is approximately 13 seconds long. I noticed that…

tmsBoston
- 23
- 3
1
vote
1 answer
Is likelihood calculated over the whole training set or a single example?
Suppose I have a training set of (x, y) pairs, where x is the input example and y is the corresponding target and y is a value (1 ... k) (k is the number of classes).
When calculating the likelihood of the training set, should it be calculated for…

Cheshie
- 2,777
- 6
- 32
- 51
1
vote
1 answer
n-gram probability count in ARPA file
I start working on a problem related with language modelling, but some calculation does not clear to me. For example consider the following simple text:
I am Sam Sam I am I do not like green eggs and ham
I have used berkelylm to create the n-gram…

Muhammad Asaduzzaman
- 1,201
- 3
- 19
- 33
1
vote
0 answers
KenLM perplexity weirdness
I have 96 files each containing ~10K lines of English text (tokenized, downcased). If I loop through the files (essentially doing k-fold cross-validation with k=#files) and build a LM (using bin/lmplz) for 95 and run bin/query on the held out file…

dbl
- 163
- 1
- 11