I have trying to build a regression model on Spark using some custom data and the intercept and weights are always nan
.
This is my data:
data = [LabeledPoint(0.0, [27022.0]), LabeledPoint(1.0, [27077.0]), LabeledPoint(2.0, [27327.0]), LabeledPoint(3.0, [27127.0])]
Output:
(weights=[nan], intercept=nan)
However, if I use this dataset (taken from Spark examples), it returns a non nan
weight and intercept.
data = [LabeledPoint(0.0, [0.0]), LabeledPoint(1.0, [1.0]), LabeledPoint(3.0, [2.0]),LabeledPoint(2.0, [3.0])]
Output:
(weights=[0.798729902914], intercept=0.3027117101297481)
This my current code
model = LinearRegressionWithSGD.train(sc.parallelize(data), intercept=True)
Am I missing something? Is it because the numbers on my data are that big? It is my first time using MLlib so I might be missing some details.
Thanks