1

I am using LDA in mallet to explore my data. I do not have any problem with running, just I need to have the probability of top words (let's say 20 words)

I use this query:

bin\mallet train-topics  --input tutorial.mallet  --num-topics 40 --optimize-interval 20 --output-state topic-state_doc_40t.gz  --output-topic-keys tutorial_keys_doc_40t.txt --output-doc-topics tutorial_composition_doc_40t.txt

I do not know what would be the query for words' probabilities.

GeoBeez
  • 920
  • 2
  • 12
  • 20

2 Answers2

3

You should be able to use the --topic-word-weights-file FILENAME option.

The format for the output file is

topic [tab] word [tab] weight

where weight is proportional to the probability of the word in the topic. Divide by the sum of the weights for a topic to get the normalized probability.

David Mimno
  • 1,836
  • 7
  • 7
2

Late answer, but who knows, it might help someone else.

MALLET 2.0.8 has a new feature to output a very interesting diagnostics file containing a bunch of metrics for each topic and its top words. Word probability is one of them.

Simply add --diagnostics-file FILENAME to your train-topics command.

Number of words described for each topic is the same than defined by "--num-top-words".

Here is the link to a detailed documentation: http://mallet.cs.umass.edu/diagnostics.php. If you don't want to re-train your topic, you can output the diagnostics file anyway by using your "state" file. Everything is described in the link.