1

When using Mallet, how do I get a list of topics associated with each document? I think I need to use train-topics and --output-topic-docs, but when I do, I get an error.

I'm using Mallet (2.0.8), and I use the following bash script to do my modeling:

MALLET=/Users/emorgan/desktop/mallet/bin/mallet
INPUT=/Users/emorgan/desktop/sermons
OBJECT=./object.mallet

$MALLET import-dir --input $INPUT --output $OBJECT --keep-sequence --remove-stopwords

$MALLET train-topics --input $OBJECT --num-topics 10 --num-top-words 1 \
--num-iterations 50 \
--output-doc-topics ./topics.txt \
--output-topic-keys ./keys.txt \
--xml-topic-report ./topic.xml \
--output-topic-docs ./docs.txt

Unfortunately, ./docs.txt does not get created. Instead I get the following error:

Exception in thread "main" java.lang.ClassCastException: java.net.URI cannot be cast to java.lang.String at cc.mallet.topics.ParallelTopicModel.printTopicDocuments(ParallelTopicModel.java:1773) at cc.mallet.topics.tui.TopicTrainer.main(TopicTrainer.java:281)

More specifically, I want Mallet to generate a list of documents and the associated topics assigned to them, or I want a list of topics and then the list of associated documents. How do I create such lists?

ericleasemorgan
  • 213
  • 1
  • 11

1 Answers1

0

At least in mallet 2.0.7, it is --output-doc-topics ./topics.txt that gives the desired table (a topic composition of each document). While the output format has changed from 2.0.7 to 2.0.8, the main content of the file stayed the same.

Sir Cornflakes
  • 675
  • 13
  • 26
  • Yes, thank you. Version 2.0.7 outputs a file when the --output-doc-topics option is used. I will continue to use version 2.0.7 until the issues is resolved in newer versions of Mallet. Thank you, jknappen. – ericleasemorgan Mar 19 '17 at 18:37