0

Load the data that you want score. The data is stored in libsvm format in the following manner: label index1:value1 index2:value2 ... (the indices are one-based and in ascending order) Here is the sample data
100 10:1 11:1 208:1 400:1 1830:1

 val unseendata: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc,unseendatafileName)
    val scores_path = results_base + run_id + "/"  + "-scores"
// Load the saved model
    val lrm = LogisticRegressionModel.load(sc,"logisticregressionmodels/mymodel")

    // I had saved the model after the training using save method. Here is the metadate for that model LogisticRegressionModel/mymodel/metadata/part-00000
{"class":"org.apache.spark.mllib.classification.LogisticRegressionModel","version":"1.0","numFeatures":176894,"numClasses":2}

      // Evaluate model on unseen data
       var valuesAndPreds = unseendata.map { point =>
       var prediction = lrm.predict(point.features)
        (point.label, prediction)
    }

// Store the scores
    valuesAndPreds.saveAsTextFile(scores_path)

Here is the error message that I get:

16/04/28 10:22:07 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 5, ): java.lang.IllegalArgumentException: requirement failed at scala.Predef$.require(Predef.scala:221) at org.apache.spark.mllib.classification.LogisticRegressionModel.predictPoint(LogisticRegression.scala:105) at org.apache.spark.mllib.regression.GeneralizedLinearModel.predict(GeneralizedLinearAlgorithm.scala:76)

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
user3803714
  • 5,269
  • 10
  • 42
  • 61

1 Answers1

1

The code that throws the exception is require(dataMatrix.size == numFeatures).

My guess is that the model was fit with 176894 features (see "numFeatures":176894 in the output of the model) while the libsvm file has only 1830 features. The numbers must match.

Change the line where you load libsvm to be:

val unseendata = MLUtils.loadLibSVMFile(sc, unseendatafileName, 176894)
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420