I have created a multiple linear regression model on some data (housing prices for the Seattle county) with GraphLab Create and one with Scikit-Learn. Test and training set are chose at random but I've used the same split (80/20). However, the results are very different.
The mean error for the GraphLab model is 106254.49 while for the Scikit-Learn model it's 168980.44
The code to create the GraphLab model is from an online course, so I assume it's correct. The one I wrote for the Scikit model is:
model = LinearRegression().fit(train_features,train_target)
test_predictions = model.predict(test_features)
errors = abs(test_predictions - test_target)
I understand that the data for the two models is not exactly the same because both samples were chosen at random, but with a training set size of about 17k rows and a test set size of about 4k rows I wouldn't expect a big difference.
Any suggestions? Am I doing something wrong with the Scikit linear regression?
In essence I would like to be able to replicate the GraphLab model using Scikit, expecting very similar performances.
Thanks