Questions tagged [perplexity]

Perplexity is a measurement of how well a probability distribution or probability model predicts a sample.

From Wikipedia

In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. A low perplexity indicates the probability distribution is good at predicting the sample.

40 questions
0
votes
0 answers

Gensim perplexity score increases

I am trying to calculate the perplexity score in Spyder for different numbers of topics in order to find the best model parameters with gensim. However, the perplexity score is not decreasing as it is supposed to [1]. Besides, there seem to be more…
blackmamba
  • 15
  • 6
0
votes
1 answer

Latent Dirichlet Allocation Implementation with Gensim

I am doing project about LDA topic modelling, i used gensim (python) to do that. I read some references and it said that to get the best model topic thera are two parameters we need to determine, the number of passes and the number of topic. Is that…
0
votes
1 answer

How to compute the perplexity in text classification?

I'm doing dialect text classification with scikit learn, naive bayes and countvectorizer. So far I'm only doing 3 dialects text classification. I'm going to add a new dialect(or actually, the formal language for those dialects). The problem is, the…
John Sall
  • 1,027
  • 1
  • 12
  • 25
0
votes
2 answers

How can I test a word2vec over development data?

In a computer assignment, it's requested to implement word2vec algorithm to generate dense vectors for some words using a neural network. I implemented the neural network and trained it over training data. First, how can I test it over the test…
Ahmad
  • 8,811
  • 11
  • 76
  • 141
0
votes
1 answer

Calculating Perplexity and Memory Issues in Keras/Tensorflow

I'd like to evaluate my model with Perplexity after each training epoch. I'm using Keras with Tensorflow backend. The problem is, that after each evaluation more and more memory is used but never released. So after a few epochs my system crashes. It…
Jurek
  • 3
  • 5
0
votes
0 answers

nltk calc perplexity of bigram/trigram

I train bgram,trigram: bgram = bigrams(sentences) trigram = trigrams(sentences) And want to calculate perplexity: p = bgram.perplexity But get an error: AttributeError: 'generator' object has no attribute 'perplexity' How perplexity should be…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
1 answer

How to calculate perplexity for LDA with Gibbs sampling

I perform an LDA topic model in R on a collection of 200+ documents (65k words total). The documents have been preprocessed and are stored in the document-term matrix dtm. Theoretically, I should expect to find 5 distinct topics in the corpus, but I…
Michael
  • 159
  • 1
  • 2
  • 14
0
votes
1 answer

How does language model evaluation work with unknown words?

So for building language models, less frequent words ranked beyond vocabulary size are replaced as 'UNK'. My question is, how to evaluate such language models that evaluates probabilities based on 'UNK'? Say we want to evaluate the perplexity of…
Ark
  • 13
  • 3
0
votes
1 answer

How can the perplexity of a language model be between 0 and 1?

In Tensorflow, I'm getting outputs like 0.602129 or 0.663941. It appears that values closer to 0 imply a better model, but it seems like perplexity is supposed to be calculated as 2^loss, which implies that loss is negative. This doesn't make any…
0
votes
1 answer

Isn't Tensorflow RNN PTB tutorial test measure and state reset wrong?

I have two question on Tensorflow PTB RNN tutorial code ptb_word_lm.py. Code blocks below are from the code. Is it okay to reset state for every batch? self._initial_state = cell.zero_state(batch_size, data_type()) with tf.device("/cpu:0"): …
1 2
3