how to get word-topic probability using mallet

Question

I've made a parallel topic model using mallet.

And I want to get top-words for each document.

To do that, I'm trying to get a word-topic probability matrix.

How would I achieve this?

What are trying trying to get? Do you want the top topics for a document, or the top words in a topic, or some mix of the two? — bean5, Dec 11 '13 at 00:22

score 8 · Answer 1 · edited Mar 13 '19 at 14:44

8

When you are building topics using MALLET, you have an option called --word-topic-counts-file. When you give this option and specify a file, MALLET writes ( topic, word, probability ) values per each line in the file. You can later read this file in C, Java or R (of course, any language) to create the matrix you want.

edited Mar 13 '19 at 14:44

GileBrt

1,830
3
20
28

answered Jun 17 '14 at 14:03

Praveen

338
2
11

GileBrt · Answer 2 · 2019-03-13T14:14:20.547

2

Just to make one point regarding the answer of Praveen.

Using the --word-topic-counts-file, MALLET will create a file which first few rows look something like this:

0 elizabeth 19:1
1 needham 19:2 17:1
2 died 19:2
3 mother 17:1 19:1 14:1

where first line means that the word elizabeth has been present in the topic 19 once; second line means that the word needham is associated two times with the topic 19, and with the topic 17 once; and so on...
Although, this file doesn't give you explicit probabilities, you can use it to calculate them.

edited Mar 13 '19 at 14:14

answered May 24 '16 at 08:48

GileBrt

1,830
3
20
28

You'll need to include alpha values as well when you calculate the probabilities. I'm not entirely certain, but I believe the calculation would be as described in [this comment](http://stackoverflow.com/questions/33251703/how-to-get-a-probability-distribution-for-a-topic-in-mallet#comment69702638_33251703). – senderle Dec 20 '16 at 21:00

how to get word-topic probability using mallet

2 Answers2

Linked