I am trying to run the linear regression with spark but it gives me really wrong predictions:
The program:
def linear_regression(data):
"""
Run the linear regression algorithm on the data to perform the prediction
"""
# Build the model
model = LinearRegressionWithSGD.train(data, iterations=100, step=0.1, intercept=True)
real_and_predicted = data.map(lambda p: (p.label, model.predict(p.features)))
real_and_predicted=real_and_predicted.collect()
return model, real_and_predicted
Results are really wrong! A problem in my code?