Best way to make a linear regression model from a split .csv dataset?

Question

I'm generally quite new to Python, and I'm having trouble making a linear regression model. I need to make it from a training and test set from a large excel dataset (.csv).

I've split the dataset already:

import pandas as pd
import numpy as np

df = pd.read_csv('C:/Dataset.csv')
df['split'] = np.random.randn(df.shape[0], 1)
split = np.random.rand(len(df)) <= 0.75

training_set = df[split]
testing_set = df[~split]

How can I use this split data to make a linear regression model using the Mean Average Error?

Thanks.

Should that be Mean Absolute Error? – James Phillips Apr 30 '17 at 19:43 — James Phillips, Apr 30 '17 at 19:43

mforpe · Answer 1 · 2017-04-30T19:54:12.100

With Scikit-learn is straightforward

import pandas as pd
from sklearn import linear_model
from sklearn.metrics import mean_absolute_error

Load dataset and split the data into training/testing sets

X_train = df[split]
X_test  = df[~split]

Split the target into training/testing sets

y_train = df.target[split]
y_test = df.target[~split]

Create linear regression object

regr = linear_model.LinearRegression()

Train the model using the training sets

regr.fit(X_train, y_train)

Predict target

y_pred = regr.predict(X_test)

Print the coefficients

print('Coefficients: \n', regr.coef_)

Print the mean absolute error

print("Mean absolute error: %.2f"
       % mean_absolute_error(y_test, y_pred))

Best way to make a linear regression model from a split .csv dataset?

1 Answers1