Input Spark Dataframe to DeepLearning4J model

Question

I've data in my spark dataframe (df) which have 24 features and the 25th column is my target variable. I want to fit my dl4j model on this dataset which takes input in the form of org.nd4j.linalg.api.ndarray.INDArray, org.nd4j.linalg.dataset.Dataset or org.nd4j.linalg.dataset.api.iterator.DataSetIterator. How can I convert my dataframe to the required type ?

I've also tried using Pipeline method to input spark dataframe to the model directly. But sbt dependency of dl4j-spark-ml is not working. My build.sbt file is :

scalaVersion := "2.11.8"

libraryDependencies += "org.deeplearning4j" %% "dl4j-spark-ml" % "0.8.0_spark_2-SNAPSHOT"

libraryDependencies += "org.deeplearning4j" % "deeplearning4j-core" % "0.8.0"

libraryDependencies += "org.nd4j" % "nd4j" % "0.8.0"

libraryDependencies += "org.nd4j" % "nd4j-native-platform" % "0.8.0"

libraryDependencies += "org.nd4j" % "nd4j-backends" % "0.8.0"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.1"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.1"

Can someone guide me from here ? Thanks in advance.

score 0 · Accepted Answer · answered Jun 13 '17 at 10:58

0

You can use snapshots which have readded the spark.ml integration. If you want to use snapshots, add the oss sonatype repository: https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/pom.xml#L16 The version at the time of this writing is: 0.8.1-SNAPSHOT

Please verify the latest version with the examples repo though: https://github.com/deeplearning4j/dl4j-examples/blob/master/pom.xml#L21

You can't mix versions of dl4j. The version you're trying to use is very out of date (by more than a year). Please upgrade to the latest version beyond that.

The new spark.ml integration examples can be found here: https://github.com/deeplearning4j/deeplearning4j/tree/master/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/test/java/org/deeplearning4j/spark/ml/impl

Make sure to add the proper dependency, which is typically something like org.deeplearning4j:dl4j-spark-ml_${YOUR SCALA BINARY VERSION}:0.8.1_spark_${YOUR SPARK VERSION (1 or 2}-SNAPSHOT

answered Jun 13 '17 at 10:58

Adam Gibson

3,055
1
10
12

I've tried using `"org.deeplearning4j" %% "dl4j-spark-ml" % "0.8.0_spark_2-SNAPSHOT"` . But still its not working. I've edited my question with complete build.sbt file. Please check it once. – Ishan Jun 13 '17 at 11:26
I said 0.8.*1* Let me be more explicit: https://oss.sonatype.org/content/repositories/snapshots/org/deeplearning4j/dl4j-spark-ml_2.11/0.8.1_spark_2-SNAPSHOT/ It's definitely there. You shouldn't be running in to any problems. ^^^ – Adam Gibson Jun 13 '17 at 12:54
I had to use `resolvers += "scala-tools.org" at "https://oss.sonatype.org/content/repositories/snapshots/"` to make it work. Thanks for information. But what I want is to use - org.deeplearning4j.spark.ml.classification.NeuralNetworkClassification but it seems that this class is not available. Do you have any idea about it? All I want is to input my spark dataframe to the dl4j model. – Ishan Jun 14 '17 at 11:13
That's..what this new release does? I'm a bit confused as to the problem here.https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/test/java/org/deeplearning4j/spark/ml/impl/SparkDl4jNetworkTest.java#L42 Beyond that we're not and won't in the future support anything related to what you're looking at now. It's more than a year old. I can tell you this as the creator of deeplearning4j. – Adam Gibson Jun 14 '17 at 11:17

Input Spark Dataframe to DeepLearning4J model

1 Answers1