3

I am trying to build a model with 500 or 1000 topics on a 1M document dataset with Mallet LDA. After 60 iterations I am getting an ArrayIndexOutOfBoundsException. The error message is as below:

<60> LL/token: -7.64386
overflow on type 8
java.lang.ArrayIndexOutOfBoundsException: 500
at cc.mallet.topics.WorkerRunnable.buildLocalTypeTopicCounts(WorkerRunnable.java:208)
at cc.mallet.topics.WorkerRunnable.run(WorkerRunnable.java:280)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
overflow on type 8

The command I am running is:

bin/mallet train-topics 
--input data.mallet 
--output-model lda.model 
--inferencer-filename topic-inferencer-model.mallet 
--output-topic-keys topic-keys.txt 
--topic-word-weights-file topic-word-weights.txt 
--word-topic-counts-file word-topic-counts-file.txt 
--output-doc-topics doc-topics.txt 
--num-topics 500 
--num-threads 16 
--num-iterations 1500 
--use-symmetric-alpha FALSE

Any suggestion is much appreciated.

ak.
  • 143
  • 9
  • Do you know which version of Mallet you're running? – David Mimno Dec 30 '16 at 19:17
  • 1
    Also: setting `--use-symmetric-alpha` to false by itself doesn't currently do anything, you need to set an alpha optimization interval. This is a reasonable thing to expect to work, though, and it should go into a future version. – David Mimno Dec 30 '16 at 19:21
  • I am using Mallet 2.0.8RC3 in Ubuntu. Thank you for your advice. I am trying it again with optimization interval and see if it works. – ak. Jan 02 '17 at 09:36
  • @yellowbedsheet Did it ever work for you? I'm having the same issue trying to use ParallelTopicModel class in my Java code. – Mike Borkland Jul 10 '18 at 22:02
  • @MikeBorkland No, it didn't. And I haven't tried building a model again on a large scale. – ak. Aug 23 '18 at 19:13
  • @yellowbedsheet I got it to work by manually setting the number of threads to 1. There is a method that allows you to do that easily. The problem is clearly with the code used to implement the concurrency. – Mike Borkland Aug 23 '18 at 19:25

0 Answers0