Spark (Scala) - how to optimize objective function parameters

Question

I have function f going from R2 to R which takes 2 parameters (a and b) and returns a scalar. I would like to use an optimizer to estimate the values of a and b for which the value returned by f is maximized (or minimized, I can work with -f).

I have looked into the LBFGS optimizer from mllib, see:

the doc at https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.mllib.optimization.LBFGS and https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.mllib.optimization.LBFGS$
an example for logistic regression at https://spark.apache.org/docs/2.1.0/mllib-optimization.html

My issue is that I am not sure I fully understand how this optimizer works.

The optimizers I have seen before in python and R usually expect the following parameters: an implementation of the objective function, a set of initial values for the parameters of the objective function (and optionally: additional arguments for the objective function, boundaries for the domain within which the parameters should be searched...).

Usually, the optimizer invokes the function iteratively using a set of inital parameters provided by the user and calculates a gradient until the value returned by the objective function has converged (or the loss). It then returns the best set of parameters and corresponding value of the objective function. Pretty standard stuff.

In this case, I see org.apache.spark.mllib.optimization.LBFGS.runLBFGS expects to be given an RDD of labeled data and a gradient.

What is this data RDD argument the optimizer is expecting?
Is the argument gradient an implementation of the gradient of the objective function?
If I am to code my own gradient for my own objective function, how should the loss function be calculated (ratio of the value returned by the objective function at iteration n / (n-1)?
What is the argument initialWeights? Is it an array containing the initial values of the parameters to be optimized?
Ideally, would you be able to provide a very simple code example showing how a simple objective function can be optimized using org.apache.spark.mllib.optimization.LBFGS.runLBFGS?
Finally, could Breeze be an interesting alternative? https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/package.scala

Thanks!

Have you reviewed the [Spark MLLib documentation](https://spark.apache.org/docs/2.2.0/mllib-optimization.html)? — Jeremy, Jan 04 '18 at 00:31
Hello @Jeremy, thanks for the suggestion. Yes, this is where I started, the link you have provided is the example link in my question. — Raphvanns, Jan 04 '18 at 00:35
[Here is an example from Spark.](https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/LBFGSExample.scala) — Jeremy, Jan 04 '18 at 00:47
@Jeremy, Thanks! Though, this is the same example as the one in the links we have already shared. I have already spent hours studying this (and others), I think I have made some progress in my understanding but this example did not take me across the finish line. Additional (different) information would be helpful. Thanks! — Raphvanns, Jan 04 '18 at 00:52
Sorry I'm not familiar with the topic and I am just trying to help find resources. — Jeremy, Jan 04 '18 at 00:54
I'd say yes to (6). Breeze has pretty straightforward implementations of unconstrained and bound-constrained optimisation (LBFGSB) + some proximal algorithms for simple constraints. There's a basic unconstrained example at https://github.com/scalanlp/breeze/wiki/Quickstart — schrödingcöder, Apr 10 '18 at 16:16

Spark (Scala) - how to optimize objective function parameters

0 Answers0