1

I need some help understanding errors that are being generated through Scala class for the RandomForestAlgorithm.scala (https://github.com/PredictionIO/PredictionIO/blob/develop/examples/scala-parallel-classification/custom-attributes/src/main/scala/RandomForestAlgorithm.scala).

I am building the project as is (custom-attributes for classification template) in PredictionIO and am getting a pio build error:

hduser@hduser-VirtualBox:~/PredictionIO/classTest$ pio build --verbose
[INFO] [Console$] Using existing engine manifest JSON at /home/hduser/PredictionIO/classTest/manifest.json
[INFO] [Console$] Using command '/home/hduser/PredictionIO/sbt/sbt' at the current working directory to build.
[INFO] [Console$] If the path above is incorrect, this process will fail.
[INFO] [Console$] Uber JAR disabled. Making sure lib/pio-assembly-0.9.5.jar is absent.
[INFO] [Console$] Going to run: /home/hduser/PredictionIO/sbt/sbt  package assemblyPackageDependency
[INFO] [Console$] [info] Loading project definition from /home/hduser/PredictionIO/classTest/project
[INFO] [Console$] [info] Set current project to template-scala-parallel-classification (in build file:/home/hduser/PredictionIO/classTest/)
[INFO] [Console$] [info] Compiling 1 Scala source to /home/hduser/PredictionIO/classTest/target/scala-2.10/classes...
[INFO] [Console$] [error] /home/hduser/PredictionIO/classTest/src/main/scala/RandomForestAlgorithm.scala:28: class RandomForestAlgorithm **needs to be abstract**, since method train in class P2LAlgorithm of type (sc: org.apache.spark.SparkContext, pd: com.test1.PreparedData)com.test1.**PIORandomForestModel is not defined**
[INFO] [Console$] [error]  class RandomForestAlgorithm(val ap: RandomForestAlgorithmParams) // CHANGED
[INFO] [Console$] [error]        ^
[INFO] [Console$] [error] one error found
[INFO] [Console$] [error] (compile:compile) Compilation failed
[INFO] [Console$] [error] Total time: 6 s, completed Jun 8, 2016 4:37:36 PM
[ERROR] [Console$] Return code of previous step is 1. Aborting.

so when I address the line causing the error and make it an abstract object:

// extends P2LAlgorithm because the MLlib's RandomForestModel doesn't
// contain RDD.
 abstract class RandomForestAlgorithm(val ap: RandomForestAlgorithmParams) // CHANGED
  extends P2LAlgorithm[PreparedData, PIORandomForestModel, // CHANGED
  Query, PredictedResult] {

  def train(data: PreparedData): PIORandomForestModel = { // CHANGED
    // CHANGED
    // Empty categoricalFeaturesInfo indicates all features are continuous.
    val categoricalFeaturesInfo = Map[Int, Int]()
    val m = RandomForest.trainClassifier(
      data.labeledPoints,
      ap.numClasses,
      categoricalFeaturesInfo,
      ap.numTrees,
      ap.featureSubsetStrategy,
      ap.impurity,
      ap.maxDepth,
      ap.maxBins)
   new PIORandomForestModel(
    gendersMap = data.gendersMap,
    educationMap = data.educationMap,
    randomForestModel = m
   )
  }

pio build is successful but training fails because it can't instantiate the new assignments for the model:

[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(6))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[WARN] [Utils] Your hostname, hduser-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface eth0)
[WARN] [Utils] Set SPARK_LOCAL_IP if you need to bind to another address
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDriver@10.0.2.15:59444]
[WARN] [MetricsSystem] Using default name DAGScheduler for source because spark.app.id is not set.
**Exception in thread "main" java.lang.InstantiationException**
    at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at io.prediction.core.Doer$.apply(AbstractDoer.scala:52)
    at io.prediction.controller.Engine$$anonfun$1.apply(Engine.scala:171)
    at io.prediction.controller.Engine$$anonfun$1.apply(Engine.scala:170)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.AbstractTraversable.map(Traversable.scala:105)
    at io.prediction.controller.Engine.train(Engine.scala:170)
    at io.prediction.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:65)
    at io.prediction.workflow.CreateWorkflow$.main(CreateWorkflow.scala:247)
    at io.prediction.workflow.CreateWorkflow.main(CreateWorkflow.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

So two questions: 1. Why is the following model not considered defined during building:

class PIORandomForestModel(
  val gendersMap: Map[String, Double],
  val educationMap: Map[String, Double],
  val randomForestModel: RandomForestModel
) extends Serializable
  1. How can I define PIORandomForestModel in a way that does not throw a pio build error and lets training re-assign attributes to the object?

I have posted this question in the PredictionIO Google group but have not gotten a response. Thanks in advance for your help.

Patrick
  • 23
  • 1
  • 5
  • Some update to my issue, after some research it turns out Scala will not compile because the class definition: class PIORandomForestModel ( val gendersMap:Map[String, Double], val educationMap:Map[String, Double], val randomForestModel:RandomForestModel ) extends Serializable Declaring parameters as vals is what stops the reassignment later in the train function. Thanks. – Patrick Jun 10 '16 at 15:11

0 Answers0