When using Mallet, how do I get a list of topics associated with each document? I think I need to use train-topics and --output-topic-docs, but when I do, I get an error.
I'm using Mallet (2.0.8), and I use the following bash script to do my modeling:
MALLET=/Users/emorgan/desktop/mallet/bin/mallet
INPUT=/Users/emorgan/desktop/sermons
OBJECT=./object.mallet
$MALLET import-dir --input $INPUT --output $OBJECT --keep-sequence --remove-stopwords
$MALLET train-topics --input $OBJECT --num-topics 10 --num-top-words 1 \
--num-iterations 50 \
--output-doc-topics ./topics.txt \
--output-topic-keys ./keys.txt \
--xml-topic-report ./topic.xml \
--output-topic-docs ./docs.txt
Unfortunately, ./docs.txt does not get created. Instead I get the following error:
Exception in thread "main" java.lang.ClassCastException: java.net.URI cannot be cast to java.lang.String at cc.mallet.topics.ParallelTopicModel.printTopicDocuments(ParallelTopicModel.java:1773) at cc.mallet.topics.tui.TopicTrainer.main(TopicTrainer.java:281)
More specifically, I want Mallet to generate a list of documents and the associated topics assigned to them, or I want a list of topics and then the list of associated documents. How do I create such lists?