0

I am testing the LogisticRegression performance on a synthetically generated data. The weights I have as input are

   w = [2, 3, 4]

with no intercept and three features. After training on 1000 synthetically generated datapoint assuming random normal distribution for each, the Spark LogisticRegression model I obtain has weights as

 [6.005520656096823,9.35980263762698,12.203400879214152]

I can see that each weight is scaled by a factor close to '3' w.r.t. the original values. I am unable to guess the reason behind this. The code is simple enough as

/*
 * Logistic Regression model
 */
 val lr = new LogisticRegression()
  .setMaxIter(50)
  .setRegParam(0.001)
  .setElasticNetParam(0.95)
  .setFitIntercept(false)

 val lrModel = lr.fit(trainingData)


 println(s"${lrModel.weights}")

I would greatly appreciate if someone could shed some light on what's fishy here.

with kind regards, Nikhil

Nikhil J Joshi
  • 1,177
  • 2
  • 12
  • 25

1 Answers1

0

I figured out the issue: I was victim of perfect separability, as my sampler wasn't working properly and the resulted data was completely deterministic. As a consequence, Logistic Regression overfitted the training data.

Nikhil J Joshi
  • 1,177
  • 2
  • 12
  • 25