7

I just tried to use Apache Spark ml library for Logistic Regression, but whenever I tried it, there was an error message, such as

"ERROR OWLQN: Failure! Resetting history: breeze.optimize.NaNHistory: "

The example of data set for logistic regression is following:

+-----+---------+---------+---------+--------+-------------+
|state|dayOfWeek|hourOfDay|minOfHour|secOfMin|     features|
+-----+---------+---------+---------+--------+-------------+
|  1.0|      7.0|      0.0|      0.0|     0.0|(4,[0],[7.0])|

And there is code for the logistic regression as following:

//Data Set
StructType schema = new StructType(
new StructField[]{
    new StructField("state", DataTypes.DoubleType, false, Metadata.empty()),
    new StructField("dayOfWeek", DataTypes.DoubleType, false, Metadata.empty()),
    new StructField("hourOfDay", DataTypes.DoubleType, false, Metadata.empty()),
    new StructField("minOfHour", DataTypes.DoubleType, false, Metadata.empty()),
    new StructField("secOfMin", DataTypes.DoubleType, false, Metadata.empty())
});
List<Row> dataFromRDD = bucketsForMLs.map(p -> {
    return RowFactory.create(p.label(), p.features().apply(0), p.features().apply(1), p.features().apply(2), p.features().apply(3));
}).collect();

Dataset<Row> stateDF = sparkSession.createDataFrame(dataFromRDD, schema);
String[] featureCols = new String[]{"dayOfWeek", "hourOfDay", "minOfHour", "secOfMin"};
VectorAssembler vectorAssembler = new VectorAssembler().setInputCols(featureCols).setOutputCol("features");
Dataset<Row> stateDFWithFeatures = vectorAssembler.transform(stateDF);

StringIndexer labelIndexer = new StringIndexer().setInputCol("state").setOutputCol("label");
Dataset<Row> stateDFWithLabelAndFeatures = labelIndexer.fit(stateDFWithFeatures).transform(stateDFWithFeatures);

MLRExecutionForDF mlrExe = new MLRExecutionForDF(javaSparkContext);
mlrExe.execute(stateDFWithLabelAndFeatures);

// Logistic Regression part
LogisticRegressionModel lrModel = new LogisticRegression().setMaxIter(maxItr).setRegParam(regParam).setElasticNetParam(elasticNetParam)  
// This part would occur error
.fit(stateDFWithLabelAndFeatures);
wkwk
  • 71
  • 1
  • 2

1 Answers1

0

I just ran into the same Error. It comes from breeze ScalaNLP package which Spark is just importing. It says something about the derivative is not able to be produced.

I'm not sure what this means exactly, but in my dataset i could argue that with less data i was using the more often the error was thrown. Meaning that a higher portion of missing features for a class to be trained on, the error happened more often. I think that has something to do with not being able to optimize correctly because of the missing information for a class.

Nontheless, the error does not seem to stop the code from running.

ostritze
  • 41
  • 1
  • 5