Normalized topic document probabilities text2vec R

Question

I am trying to find out the topic document probabilities after running the lda model using text2vec package in R.

Following commands generate the model:

lda_model <-  LDA$new(n_topics = n_topics, doc_topic_prior = 0.1, topic_word_prior = 0.01)
doc_topic_distr <- lda_model$fit_transform(x = quantdfm, n_iter = 2000, convergence_tol = 0.00001, n_check_convergence = 10, progressbar = FALSE)

quantdfm is the dtm using quanteda package, which I am plugging it in the $fit_transform method.

I noticed that the doc_topic_distr contains the topic document probabilities (without even asking for normalization). Is this correct? Because on a previous post: How to get topic probability table from text2vec LDA, Dmitriy Selivanov has asked to derive such probabilities using:

doc_topic_prob = normalize(doc_topic_distr, norm = "l1")

whereas when I use the same command as above, doc_topic_distr and doc_topic_prob have the same values (I thought the former contains integers as opposed to fractions in the latter).

Please suggest if this is the expected behavior of the code, or I have missed something here.

Thanks.

Pg 24 says doc_topic_distribution is a dense matrix with rows as documents and columns as topics. Row sum should add to 1. So it seems that this is already normalized. Please comment if I understand this correctly. — ds_newbie, Feb 21 '18 at 15:02

score 0 · Accepted Answer · answered Feb 22 '18 at 12:59

0

According to the up to date documentation LDA fit_transform returns topic probabilities.

answered Feb 22 '18 at 12:59

Dmitriy Selivanov

4,545
1
22
38

Normalized topic document probabilities text2vec R

1 Answers1