I'm using Latent Dirichlet Allocation in the Java version of Spark.
The following line works fine:
LDAModel ldaModel = new LDA()//
.setK( NUM_TOPICS )//
.setMaxIterations( MAX_ITERATIONS )//
.run( corpus );
And this uses (I believe) the default EM optimiser.
However, when I try to use the Stochastic Variational Optimizer, as follows:
OnlineLDAOptimizer optimizer = new OnlineLDAOptimizer()//
.setMiniBatchFraction( 2.0 / MAX_ITERATIONS );
LDAModel ldaModel = new LDA()//
.setK( NUM_TOPICS )//
.setOptimizer( optimizer )//
.setMaxIterations( MAX_ITERATIONS )//
.run( corpus );
I get the following:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 11.0 failed 1 times, most recent failure: Lost task 1.0 in stage 11.0 (TID 50, localhost): java.lang.IndexOutOfBoundsException: (0,2) not in [-3,3) x [-2,2)
at breeze.linalg.DenseMatrix.apply(DenseMatrix.scala:84)
at breeze.linalg.Matrix$class.apply(Matrix.scala:39)
...
Does anyone have any success in getting the online optimizer to work in the Java version of Spark? As far as I can tell, that's the only difference here.