Questions tagged [language-model]
266 questions
2
votes
1 answer
How to normalize probabilities of words in varying length sentences?
Let's say we have an RNN model that outputs the probability of a word given context (or no context) trained on a corpus.
We can chain the probability of each word in a sequence to get the overall probability of the sentence itself. But, because we…

Sanjay Krishna
- 157
- 1
- 7
2
votes
1 answer
Tensorflow num_classes parameter of nce_loss()
My understanding of noise contrastive estimation is that we sample some vectors from our word embeddings (the negative sample), and then calculate the log-likelihood of each. Then we want to maximize the difference between the probability of the…

Aj Langley
- 127
- 9
2
votes
0 answers
Perplexity calculation for Language Model on 1 Billion Word Language Model Benchmark
Recently, I have been trying to implement RNNLM based on this article.
There is an implementation with some LSTM factorization tricks, but similar to the original implementation by the author.
Preambula
1) The dataset is split into files and then…

fminkin
- 162
- 2
- 10
2
votes
1 answer
TensorFlow: loss jumps up after restoring RNN net
Environment info
Operating System: Windows 7 64-bit
Tensorflow installed from pre-built pip (no CUDA): 1.0.1
Python 3.5.2 64-bit
Problem
I have problems with restoring my net (RNN character base language model). Below is a simplified version with…

tmv
- 41
- 7
2
votes
0 answers
When loading KenLM language model for scoring sentences should the LM file size be less than RAM size?
When loading language model for scoring sentence should the LM('bible.klm') filesize be less than RAM size?
import kenlm
model = kenlm.LanguageModel('bible.klm')
model.score('in the beginning was the word')

Arshiyan Alam
- 335
- 1
- 11
2
votes
1 answer
Reason for eval_config setting parameters to 1 in ptb_word_lm.py
While examining the setting for evaluation in Tensorflow's PTB language model, I am perplexed by this setting for the evaluation in eval_config:
eval_config = get_config()
eval_config.batch_size = 1
eval_config.num_steps = 1
in…

Sayan Ghosh
- 31
- 2
2
votes
1 answer
nltk.KneserNeyProbDist is giving 0.25 probability distribution for most of the trigrams
I am working on Language Modeling using nltk I am using this essay as my corpus in mypet.txt file. I am getting 0.25 Kneser Ney probability distribution for most of the trigrams. I don't know why. Is it right? Why is it doing so? This is my…

Jai Prak
- 2,855
- 4
- 29
- 37
2
votes
1 answer
Word prediction : neural net versus n-gram approach
For example if I attempt to predict the next word in a sentence I can use a bi gram approach and compute the probabilities of a word occurring based on the previous word in the corpus.
If instead I use a neural net to predict the next word. The…

blue-sky
- 51,962
- 152
- 427
- 752
2
votes
1 answer
Raise MemoryError when I am fitting a sequence to sequence LSTM using Keras+Theano
I was trying to implement a sequence to sequence language model. During training process, the model takes in a sequence of 50d word vectors generated by GloVe, and output 1-to-V(V is the size of vocabulary) vector meaning the next word which thus…

高剑飞
- 21
- 1
2
votes
1 answer
What is the softmax_w and softmax_b in this document?
I'm new to TensorFlow and need to train a language model but run into some difficulties while reading the document as shown bellow.
lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size,…

Lerner Zhang
- 6,184
- 2
- 49
- 66
2
votes
0 answers
How do I change the Keras text generation example from being on character level to word level?
The above code is more or less what the Keras documentation gives us as a language model. The thing is that this language model predicts characters, not words. Strictly speaking, a language model is supposed to predict full words.
My question is,…

Cedric Oeldorf
- 579
- 1
- 4
- 10
2
votes
1 answer
How to calculate perplexity for a language model trained using keras?
Using Python 2.7 Anaconda on Windows 10
I have trained a GRU neural network to build a language model using keras:
print('Build model...')
model = Sequential()
model.add(GRU(512, return_sequences=True, input_shape=(maxlen,…

ishido
- 4,065
- 9
- 32
- 42
2
votes
1 answer
RNNLM using theano
I asked the same question on theano user list, but got no reply, just wondering if anyone can help me here.
I am trying to re-implement the RNNLM of http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf based…

user200340
- 3,301
- 13
- 52
- 74
2
votes
1 answer
What is the next procedure after creating a CMUSphinx language model with my own dictionary?
I have created my own CMUSphinx language model for Arabic language for a software that will be listening to a user and apply commands with my own dictionary that I've done it manually by hand, converted "arpa" language model type to "dmp" language…

0x01Brain
- 798
- 2
- 12
- 28
2
votes
3 answers
Language Modelling toolkit
I would like to build a language model for a text corpus. Are there good out-of-the-box toolkits which will alleviate my task? The only toolkit I know off is the Statistical Language Modelling(SLM) Toolkit by CMU.
Regards,

Dexter
- 11,311
- 11
- 45
- 61