Extract topic word probability matrix in gensim LdaModel

Question

I have the LDA model and the document-topic probabilities.

# build the model on the corpus
ldam = LdaModel(corpus=corpus, num_topics=20, id2word=dictionary) 
# get the document-topic probabilities
theta, _ = ldam.inference(corpus)

I also need the distribution of words for all the topics i.e. a topic-word probability matrix. Is there a way to extract this information?

Thanks!

score 6 · Accepted Answer · answered Feb 17 '17 at 16:49

6

The topics-term matrix (lambda) is accessible via :

topics_terms = ldam.state.get_lambda()

If you want a probability distribution just normalize it :

topics_terms_proba = np.apply_along_axis(lambda x: x/x.sum(),1,topics_terms)

answered Feb 17 '17 at 16:49

arthur

2,319
1
17
24

when I use `ldam.state.get_lambda()` I get a numpy matrix but the there are no column names. How do I identify the words? – Clock Slave Feb 19 '17 at 09:09
1

To know which word corresponds to a given index, use `ldam.id2word`. For example `ldam.id2word[0]` is the word corresponding to the first column of the matrix. – arthur Feb 20 '17 at 10:04

Extract topic word probability matrix in gensim LdaModel

1 Answers1