Spark - MLlib linear regression intercept and weight NaN

Question

I have trying to build a regression model on Spark using some custom data and the intercept and weights are always nan. This is my data:

data = [LabeledPoint(0.0, [27022.0]), LabeledPoint(1.0, [27077.0]), LabeledPoint(2.0, [27327.0]), LabeledPoint(3.0, [27127.0])]

Output:

(weights=[nan], intercept=nan)

However, if I use this dataset (taken from Spark examples), it returns a non nan weight and intercept.

data = [LabeledPoint(0.0, [0.0]), LabeledPoint(1.0, [1.0]), LabeledPoint(3.0, [2.0]),LabeledPoint(2.0, [3.0])]

Output:

(weights=[0.798729902914], intercept=0.3027117101297481)

This my current code

model = LinearRegressionWithSGD.train(sc.parallelize(data), intercept=True)

Am I missing something? Is it because the numbers on my data are that big? It is my first time using MLlib so I might be missing some details.

Thanks

score 0 · Answer 1 · answered Apr 21 '15 at 08:29

0

MLlib linear regression is SGD based therefore you need to tweak iterations and step size, see https://spark.apache.org/docs/latest/mllib-optimization.html.

I tried your custom data like this and I got some results (in scala):

val numIterations = 20
val model = LinearRegressionWithSGD.train(sc.parallelize(data), numIterations)

answered Apr 21 '15 at 08:29

selvinsource

in Python `model = LinearRegressionWithSGD.train(sc.parallelize(data), iterations=20)` – selvinsource Apr 21 '15 at 08:36

1 Answers1