I am using Apache-Spark
to perform logistic regression w/ LBFGS. I am trying to generate Learning Curves to see whether my model is suffering from high bias or high variance.
Andrew Ng discusses the usefulness of learning curves in his Lecture on Learning Curves from his Machine Learning Coursera Course. To accomplish this, I will need to obtain the loss AKA cost AKA error of the optimization function.
The Apache src contains the following in LBFGS.scala
@DeveloperApi
object LBFGS extends Logging {
... some code
def runLBFGS(...some params...): (Vector, Array[Double]) = {
val lossHistory = mutable.ArrayBuilder.make[Double]
... more code
var state = states.next()
while (states.hasNext) {
lossHistory += state.value
state = states.next()
}
lossHistory += state.value
val lossHistoryArray = lossHistory.result()
logInfo("LBFGS.runLBFGS finished. Last 10 losses %s".format(
lossHistoryArray.takeRight(10).mkString(", ")))
(weights, lossHistoryArray)
}
I can see these weights displayed in the logs, but I am not sure how to obtain them programmatically using LogisticRegressionWithLBFGS().run()
My Attempt:
val (model, lossHistoryArray) = new LogisticRegressionWithLBFGS()
.setNumClasses(2)
.run(learningSample)
However, I get the error:
constructor cannot be instantiated to expected type; [error] found : (T1, T2) [error] required: org.apache.spark.mllib.classification.LogisticRegressionModel
The cause is obvious. However, I am not sure how to get the information I need, because it seems to be nested very deep within the API.