0

I am trying to find out the topic document probabilities after running the lda model using text2vec package in R.

Following commands generate the model:

lda_model <-  LDA$new(n_topics = n_topics, doc_topic_prior = 0.1, topic_word_prior = 0.01)
doc_topic_distr <- lda_model$fit_transform(x = quantdfm, n_iter = 2000, convergence_tol = 0.00001, n_check_convergence = 10, progressbar = FALSE)

quantdfm is the dtm using quanteda package, which I am plugging it in the $fit_transform method.

I noticed that the doc_topic_distr contains the topic document probabilities (without even asking for normalization). Is this correct? Because on a previous post: How to get topic probability table from text2vec LDA, Dmitriy Selivanov has asked to derive such probabilities using:

doc_topic_prob = normalize(doc_topic_distr, norm = "l1")

whereas when I use the same command as above, doc_topic_distr and doc_topic_prob have the same values (I thought the former contains integers as opposed to fractions in the latter).

Please suggest if this is the expected behavior of the code, or I have missed something here.

Thanks.

ds_newbie
  • 79
  • 8
  • What does documentation say? – Dmitriy Selivanov Feb 20 '18 at 17:59
  • Pg 24 says doc_topic_distribution is a dense matrix with rows as documents and columns as topics. Row sum should add to 1. So it seems that this is already normalized. Please comment if I understand this correctly. – ds_newbie Feb 21 '18 at 15:02

1 Answers1

0

According to the up to date documentation LDA fit_transform returns topic probabilities.

Dmitriy Selivanov
  • 4,545
  • 1
  • 22
  • 38