linear regression with spark: wrong prediction

Asked Oct 20 '15 at 13:32

Active Oct 20 '15 at 22:17

Viewed 207 times

I am trying to run the linear regression with spark but it gives me really wrong predictions:

The data source:

The program:

def linear_regression(data):
    """
    Run the linear regression algorithm on the data to perform the prediction
    """
    # Build the model
    model = LinearRegressionWithSGD.train(data, iterations=100, step=0.1, intercept=True)
    real_and_predicted = data.map(lambda p: (p.label, model.predict(p.features)))
    real_and_predicted=real_and_predicted.collect()
      
    return model, real_and_predicted

The result:

Results are really wrong! A problem in my code?

edited Jun 20 '20 at 09:12

Community

asked Oct 20 '15 at 13:32

rom

3,592
7
41
71

1

Thanks @zero323 for the link. I had to change the step to `step=0.0005` in my case. Higher steps give negative and high values, while lower steps give lower `correlation coefficient`. Even with `step=0.0005`, the `correlation coefficient` is `0.67`, not really good :(. – rom Oct 20 '15 at 14:52
Well, if you can open your data in a spreadsheet you can easily use closed form solution :) There is no reason to use SGD. – zero323 Oct 20 '15 at 22:23
I can't process it in a spreadsheet unfortunately :(. What is a closed form solution? If I don't use `SGD`, what should I use? – rom Oct 21 '15 at 08:46

linear regression with spark: wrong prediction

0 Answers0