Deeplearning4j with spark: SparkDl4jMultiLayer evaluation with JavaRDD

Question

I'm new to Spark and I'm currently trying to build a neural network using the deeplearning4j api. The training works just fine, but I'm encountering problems at evaluation. I get the following error message

18:16:16,206 ERROR ~ Exception in task 0.0 in stage 14.0 (TID 19)
java.lang.IllegalStateException: Network did not have same number of parameters as the broadcasted set parameters at org.deeplearning4j.spark.impl.multilayer.evaluation.EvaluateFlatMapFunction.call(EvaluateFlatMapFunction.java:75)

I can't seem to find the reason for this problem, and information on spark and deeplearning4j is sparse. I essentially took this structure from this example https://github.com/deeplearning4j/dl4j-spark-cdh5-examples/blob/2de0324076fb422e2bdb926a095adb97c6d0e0ca/src/main/java/org/deeplearning4j/examples/mlp/IrisLocal.java.

This is my code

public class DeepBeliefNetwork {

private JavaRDD<DataSet> trainSet;
private JavaRDD<DataSet> testSet;

private int inputSize;
private int numLab;
private int batchSize;
private int iterations;
private int seed;
private int listenerFreq;
MultiLayerConfiguration conf;
MultiLayerNetwork model;
SparkDl4jMultiLayer sparkmodel;
JavaSparkContext sc;

MLLibUtil mllibUtil = new MLLibUtil();

public DeepBeliefNetwork(JavaSparkContext sc, JavaRDD<DataSet> trainSet, JavaRDD<DataSet> testSet, int numLab,
        int batchSize, int iterations, int seed, int listenerFreq) {

    this.trainSet = trainSet;
    this.testSet = testSet;
    this.numLab = numLab;
    this.batchSize = batchSize;
    this.iterations = iterations;
    this.seed = seed;
    this.listenerFreq = listenerFreq;
    this.inputSize = testSet.first().numInputs();
    this.sc = sc;



}

public void build() {
    System.out.println("input Size: " + inputSize);
    System.out.println(trainSet.first().toString());
    System.out.println(testSet.first().toString());

    conf = new NeuralNetConfiguration.Builder().seed(seed)
            .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
            .gradientNormalizationThreshold(1.0).iterations(iterations).momentum(0.5)
            .momentumAfter(Collections.singletonMap(3, 0.9))
            .optimizationAlgo(OptimizationAlgorithm.CONJUGATE_GRADIENT).list(4)
            .layer(0,
                    new RBM.Builder().nIn(inputSize).nOut(500).weightInit(WeightInit.XAVIER)
                            .lossFunction(LossFunction.RMSE_XENT).visibleUnit(RBM.VisibleUnit.BINARY)
                            .hiddenUnit(RBM.HiddenUnit.BINARY).build())
            .layer(1,
                    new RBM.Builder().nIn(500).nOut(250).weightInit(WeightInit.XAVIER)
                            .lossFunction(LossFunction.RMSE_XENT).visibleUnit(RBM.VisibleUnit.BINARY)
                            .hiddenUnit(RBM.HiddenUnit.BINARY).build())
            .layer(2,
                    new RBM.Builder().nIn(250).nOut(200).weightInit(WeightInit.XAVIER)
                            .lossFunction(LossFunction.RMSE_XENT).visibleUnit(RBM.VisibleUnit.BINARY)
                            .hiddenUnit(RBM.HiddenUnit.BINARY).build())
            .layer(3, new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD).activation("softmax").nIn(200)
                    .nOut(numLab).build())
            .pretrain(true).backprop(false).build();

}

public void trainModel() {

    model = new MultiLayerNetwork(conf);
    model.init();
    model.setListeners(Collections.singletonList((IterationListener) new ScoreIterationListener(listenerFreq)));

    // Create Spark multi layer network from configuration

    sparkmodel = new SparkDl4jMultiLayer(sc.sc(), model);
    sparkmodel.fitDataSet(trainSet);

//Evaluation
    Evaluation evaluation = sparkmodel.evaluate(testSet);
    System.out.println(evaluation.stats());

Does anyone have advice about how to handle my JavaRDD? I believe that the problem lies in there.

Thanks a lot!

EDIT1

I'm using deeplearning4j version 0.4-rc.10, and spark 1.5.0 Here's the stack trace

11:03:53,088 ERROR ~ Exception in task 0.0 in stage 16.0 (TID 21 java.lang.IllegalStateException: Network did not have same number of parameters as the broadcasted set parameter
at org.deeplearning4j.spark.impl.multilayer.evaluation.EvaluateFlatMapFunction.call(EvaluateFlatMapFunction.java:75)
at org.deeplearning4j.spark.impl.multilayer.evaluation.EvaluateFlatMapFunction.call(EvaluateFlatMapFunction.java:41)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:156)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:156)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:706)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:706)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1280)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1267)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1493)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1455)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1444)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1933)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1003)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:985)
at org.apache.spark.api.java.JavaRDDLike$class.reduce(JavaRDDLike.scala:375)
at org.apache.spark.api.java.AbstractJavaRDDLike.reduce(JavaRDDLike.scala:47)
at org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer.evaluate(SparkDl4jMultiLayer.java:629)
at org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer.evaluate(SparkDl4jMultiLayer.java:607)
at org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer.evaluate(SparkDl4jMultiLayer.java:597)
at deep.deepbeliefclassifier.DeepBeliefNetwork.trainModel(DeepBeliefNetwork.java:117)
at deep.deepbeliefclassifier.DataInput.main(DataInput.java:105)

Can you post a stack trace where the exception comes from? – SpamBot Aug 05 '16 at 08:51 — SpamBot, Aug 05 '16 at 08:51
thanks for the answer, posted it in the edit. – graffo Aug 05 '16 at 09:30 — graffo, Aug 05 '16 at 09:30
Could you try using the latest version first? It's 0.5.0 – Adam Gibson Aug 05 '16 at 09:47 — Adam Gibson, Aug 05 '16 at 09:47

score 0 · Answer 1 · answered Aug 05 '16 at 08:59

0

Could you give us the version you're using and the like? I wrote part of the internals there. It sounds like it's not sending the parameters right.

The parameters is 1 long vector that represents the coefficients of the model.

Deeplearning4j uses parameter averaging of the vectors for scaling deep learning across executors. Also make sure you're using kryo. In the latest version we give a warning.

answered Aug 05 '16 at 08:59

Adam Gibson

3,055
1
10
12

thanks for the answer. i'll try kryo, hadn't heard of it yet. So should I specify the number of parameters somehow? – graffo Aug 05 '16 at 09:40
No you can't do that. FWIW, if you are still having problems please see our links on deeplearning4j.org https://deeplearning4j.org/spark – Adam Gibson Aug 07 '16 at 12:41
Thanks Adam, I tried the new version and the problem doesn't occur anymore, the new API makes training with JavaRDDs much easier. – graffo Aug 08 '16 at 12:56

Deeplearning4j with spark: SparkDl4jMultiLayer evaluation with JavaRDD

1 Answers1