Questions tagged [language-model]

266 questions
1
vote
2 answers

Get the probability distribution of next word given a sequence using TensorFlow's RNN (LSTM) language model?

I'm running TensorFlow's RNN (LSTM) language model example here. It runs and reports the perplexities perfectly. What I want though is three things: Given a sequence (e.g. w1 w5 w2000 w750) give me the probability distribution for the next word…
Ash
  • 3,428
  • 1
  • 34
  • 44
1
vote
1 answer

How to learn two sequences simultaenously through LSTM in Tensorflow/TFLearn?

I am learning LSTM based seq2seq model in Tensorflow platform. I can very well train a model on a given simple seq2seq examples. However, in cases where I have to learn two sequences at once from a given sequence (for e.g: learning previous…
user3480922
  • 564
  • 1
  • 10
  • 22
1
vote
1 answer

TensorFlow reset state during batch = sentence-level language model

What is the best way to build a recurrent language model (e.g. LSTM) that does not cross sentence boundaries? Or put more general, if you present a batch to the model, each row containing multiple sentences, how can you reset the state after seeing…
niefpaarschoenen
  • 560
  • 1
  • 8
  • 19
1
vote
2 answers

Dynamic LSTM model in Tensorflow

I am looking to design a LSTM model using Tensorflow, wherein the sentences are of different length. I came across a tutorial on PTB dataset (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/rnn/ptb/ptb_word_lm.py). How does…
1
vote
0 answers

What is a simple example of a TensorFlow file pipeline for a language model?

I am building a RNN language model in TensorFlow. My raw input consists of files of text. I am able to tokenize them, so that data I am working with is sequences of integers that are indexes into a vocabulary. Following the example in…
W.P. McNeill
  • 16,336
  • 12
  • 75
  • 111
1
vote
0 answers

Using language model tool without any installation

I know that there are some language model tools which are IRSLM, MITLM, SRILM . All of them need to a installation to be able to create a language model etc. However I need a language model tool which is not needed any installation and can be used…
ziLk
  • 3,120
  • 21
  • 45
1
vote
1 answer

language model with SRILM

I'm trying to build a language model using SRILM. I have a list of phrases and I create the model using: ./ngram-count -text corpus.txt -order 3 -ukndiscount -interpolate -unk -lm corpus.lm After this I tried to make some example to see the…
Daniele
  • 821
  • 7
  • 18
1
vote
1 answer

Wrong number of dimensions: expected 0, got 1 with shape (1,)

I am doing word-level language modelling with a vanilla rnn, I am able to train the model but for some weird reasons I am not able to get any samples/predictions from the model; here is the relevant part of the code: train_set_x, train_set_y, voc =…
uyaseen
  • 1,189
  • 3
  • 16
  • 34
1
vote
2 answers

nltk language model TypeError:ngarms() got an unexpected keyword argument 'pad_symbol'

I'm executing the following code: from nltk.corpus import brown from nltk.model import Ngram lm = NgramModel(2, brown.words(categories='news'), estimator=None) But I got an error: I really don't know why I do have this problem; is it a bug from…
Am1rr3zA
  • 7,115
  • 18
  • 83
  • 125
1
vote
1 answer

Correct parameters for wngram2idngram?

I am trying to generate the arpa format language model with the following commands: text2wngram < weather.txt | grep -v " " > weather.wngram wngram2idngram -vocab weather.vocab < weather.wngram > weather.idngram idngram2lm -vocab_type 0…
g10dras
  • 399
  • 2
  • 11
1
vote
1 answer

CMU Sphinx4 - Custom Language Model

I have a very specific requirement. I am working on an application which will allow users to speak their employee number which is of the format HN56C12345 (any alphanumeric characters sequence) into the app. I have gone through the link:…
Qedrix
  • 453
  • 1
  • 8
  • 15
1
vote
1 answer

Why is my Sphinx4 Recognition poor?

I am learning how to use Sphinx4 using the Maven plug-in for Eclipse. I took the transcribe demo found on GitHub and altered it to process a file of my own. The audio file is 16bit, mono, 16khz. It is approximately 13 seconds long. I noticed that…
1
vote
1 answer

Is likelihood calculated over the whole training set or a single example?

Suppose I have a training set of (x, y) pairs, where x is the input example and y is the corresponding target and y is a value (1 ... k) (k is the number of classes). When calculating the likelihood of the training set, should it be calculated for…
Cheshie
  • 2,777
  • 6
  • 32
  • 51
1
vote
1 answer

n-gram probability count in ARPA file

I start working on a problem related with language modelling, but some calculation does not clear to me. For example consider the following simple text: I am Sam Sam I am I do not like green eggs and ham I have used berkelylm to create the n-gram…
Muhammad Asaduzzaman
  • 1,201
  • 3
  • 19
  • 33
1
vote
0 answers

KenLM perplexity weirdness

I have 96 files each containing ~10K lines of English text (tokenized, downcased). If I loop through the files (essentially doing k-fold cross-validation with k=#files) and build a LM (using bin/lmplz) for 95 and run bin/query on the held out file…
dbl
  • 163
  • 1
  • 11