By default, logistic regression training initializes the coefficients to be all-zero. However, I would like to initialize the coefficients myself. This would be useful, for example, if a previous training run crashed after several iterations -- I could simply restart training with the last known set of coefficients.
Is this possible with any of the dataset/dataframe-based APIs, preferably Scala?
Looking at the Spark source code, it seems that there is a method setInitialModel
to initialize the model and its coefficients, but it's unfortunately marked as private.
The RDD-based API seems to allow initializing coefficients: one of the overloads of LogisticRegressionWithSGD.run(...)
accepts an initialWeights
vector. However, I would like to use the dataset-based API instead of the RDD-based API because (1) the former supports elastic net regularization (I couldn't figure out how to do elastic net with the RDD-based logistic regression) and (2) because the RDD-based API is in maintenance mode.
I could always try using reflection to call that private setInitialModel
method, but I would like to avoid this if possible (and maybe that wouldn't even work... I also can't tell if setInitialModel
is marked private for a good reason).