0

Based on 37,000 article texts, I implemented LDA mallet topic modeling. Each article was properly categorized and the dominant topic of each was determined.

Now I want to create a dataframe that shows each topic's percentages for each article, in Python.

I want the data frame to look like this:


no |      Text     | Topic_Num_1 | Topic_Num_2 | .... | Topic_Num_25
01 | article text1 |   0.7529    |   0.0034    | .... | 0.0011
02 | article text2 |   0.3529    |   0.0124    | .... | 0.0001

.... (37000 x 27 row)

How would I do this?

+

All the code I've been doing is based on the following site.

http://machinelearningplus.com/nlp/topic-modeling-gensim-python

How can I see the all probability list of the topics of every single article?

Eunice
  • 11
  • 1

1 Answers1

0

Here's a useful link for anyone that has just discovered this question.

I'm also pasting some example code, assuming that you have built a LDA-model and that you want to concatenate the topic-scores to a dataframe df.

import gensim
import numpy as np

lda_model = gensim.models.LdaMulticore(corpus, id2word, num_topics)
lda_scores = lda_model[corpus]

all_topics_csr = gensim.matutils.corpus2csc(lda_scores)
all_topics_numpy = all_topics_csr.T.toarray()
all_topics_pandas = pd.DataFrame(all_topics_numpy).reindex(df1.index).fillna(0)

df = pd.concat([df, all_topics_pandas.reindex(df.index)], axis=1, join="inner")
yatzima
  • 31
  • 5