0

I have the following mallet command (for v 2.0.8 (May 3,2016)) under Linux 2.6.32-696.18.7.el6.x86_6 and Java SE Runtime Environment (build 1.7.0_05-b06):

bin/mallet train-topics --input html/$1/topic --num-topics $1 \
--output-doc-topics result  \
--output-topic-docs top.gz
--optimize-interval 10 \
--num-threads 20 \
--output-topic-keys keys.txt \
--optimize-interval 10

but after 1000 iterations I only get this output:

<1000> LL/token: -8.98037
Total time: 1 hours 47 minutes 18 seconds
Exception in thread "main" java.lang.ClassCastException: java.net.URI cannot be cast to java.lang.String
        at cc.mallet.topics.ParallelTopicModel.printTopicDocuments(ParallelTopicModel.java:1773)
        at cc.mallet.topics.tui.TopicTrainer.main(TopicTrainer.java:281)
Any suggestions for how to what this means or how to avoid the problem? Is there a way to proceed?

Daniel Feenberg NBER

1 Answers1

0

Thanks for using Mallet! The immediate cause is that the 2.0.8 release is expecting the "Name" field to be a string, and not a URI. It looks like this was fixed in a pull request from Te Rutherford shortly after the 2.0.8 release. There should be a 2.1 pre-release available in the next few weeks.

David Mimno
  • 1,836
  • 7
  • 7
  • Which is the "Name" field? Is it in my command line? I don't see a URI in my command line but I am happy to change the command line if that would help. Can I quote a filepath or stick to filenames (not paths) or just wait for 2.1? – Daniel Feenberg Jul 05 '18 at 16:52
  • How are you creating an instance list? I don't remember how this situation arises. – David Mimno Jul 06 '18 at 12:35
  • As far as I can tell, the command line Mallet does not provide for specifying an "instance list". So I can't answer your question. In any case it appears that using files in the current directory allows me to specify filenames without a slash, and that seems to allow Mallet to work. – Daniel Feenberg Jul 06 '18 at 19:00
  • @david-mimno I also have this problem. It's been a year since your answer and no version came after 2.8.0. Do you know a version that doesn't have this problem? – Sara Fahim Nov 24 '19 at 11:13
  • The current development version in Github has this fix. – David Mimno Nov 25 '19 at 15:48