1

I want to run a model in Mallet and need the topic-docs output, which gives the most prominent documents for each topic. This is necessary for interpreting the less clear topics correctly. But Mallet keeps on giving me empty txt files.

This is the command I use:

    bin\mallet train-topics --input cleandata1000.mallet --num-topics 250 --num-iterations 3000 --optimize-interval 50 --optimize-burn-in 50 --output-topic-keys 1000-300-3000-50-topic-keys.txt --output-topic-docs 1000-300-1000-50-topic-docs.txt --num-top-docs 20 --output-doc-topics 1000-300-1000-50-doc-topics.txt --doc-topics-threshold 0.01 --xml-topic-phrase-report 1000-300-1000-50-topic-phrase.xml --output-state 1000-300-1000-50-state.gz --use-symmetric-alpha true

Does anyone know what the cause could be?

Edit in response to David Mimno's 4 Nov comment:

The same thing happens with different data (where the docs have a different lenght).

I just ran some other models with Mallet's test data. Peculiar: This trial gave no output at all (so the "en-topic-docs.txt" did not get made).

bin\mallet train-topics --input en.mallet --num-topics 5 --output-topic-docs en-topic-docs.txt 

When I ask for the topic keys as output, both files are made, but the en-topic-docs.txt is empty.

bin\mallet train-topics --input en.mallet --num-topics 5 --output-topic-keys en-topic-keys.txt --output-topic-docs en-topic-docs.txt

My bad: there is a recurring error message:

Exception in thread "main" java.lang.ClassCastException: class java.net.URI cannot be cast to class java.lang.String (java.net.URI and java.lang.String are in module java.base of loader 'bootstrap') at cc.mallet.topics.ParallelTopicModel.printTopicDocuments(ParallelTopicModel.java:1773) at cc.mallet.topics.tui.TopicTrainer.main(TopicTrainer.java:281)

I don't know what this might mean.

Thank you for any help, you are saving my PhD :)

Maarten D.
  • 11
  • 2
  • Nothing looks obviously wrong. Is it possible the data file is empty or corrupted? Could error messages be getting dropped? – David Mimno Nov 04 '21 at 15:45

1 Answers1

0

I was able to fix this by using the latest release on github (202108) instead of MALLET 2.0.8. Now it works like a charm.

Instructions for using the developmental release: http://mallet.cs.umass.edu/download.php

Thank you for the pointers, David Mimno!

Maarten D.
  • 11
  • 2