How to apply the results of linear regression on a training set of data to a testing set of data?

Question

I have two non-empty dataframes: training and testing. Each of these dataframes has two columns: Y and X, in this order. I have applied linear regression analysis to training as follows:

m <- lm(Y ~ X, data = training)

I would like to apply the coefficients resulting from this fitting to the data in testing to obtain the same types of information available in the object m for purposes of further analysis and data visualization. How can I do this?

Are you talking about something like `predict(lm(Y ~ X, data=training), newdata=testing)`? — r2evans, Nov 18 '14 at 07:20
@r2evans: Yes, thanks. If I understand correctly, `testing`'s `Y` column is simply ignored by the `predict` function, right? — Evan Aad, Nov 18 '14 at 08:32
@r2evans: But how does `predict` know to ignore `Y` rather than `X`? — Evan Aad, Nov 18 '14 at 08:50
When you start the regression with `lm(Y ~ X, ...)`, you are labeling `Y` as the response variable. The model retains this information, so then `predict()` knows this is the variable you are trying to predict based on the other variables (explanatory factors). — r2evans, Nov 18 '14 at 17:52

score 2 · Accepted Answer · answered Nov 18 '14 at 07:20

2

See the predict.lm function:

Y_pred = predict(m, newdata = testing)

answered Nov 18 '14 at 07:20

user1808924

4,563
2
17
20

How to apply the results of linear regression on a training set of data to a testing set of data?

1 Answers1