2

I have two non-empty dataframes: training and testing. Each of these dataframes has two columns: Y and X, in this order. I have applied linear regression analysis to training as follows:

m <- lm(Y ~ X, data = training)

I would like to apply the coefficients resulting from this fitting to the data in testing to obtain the same types of information available in the object m for purposes of further analysis and data visualization. How can I do this?

Evan Aad
  • 5,699
  • 6
  • 25
  • 36
  • 1
    Are you talking about something like `predict(lm(Y ~ X, data=training), newdata=testing)`? – r2evans Nov 18 '14 at 07:20
  • @r2evans: Yes, thanks. If I understand correctly, `testing`'s `Y` column is simply ignored by the `predict` function, right? – Evan Aad Nov 18 '14 at 08:32
  • 1
    Yes, that's my understanding. – r2evans Nov 18 '14 at 08:38
  • @r2evans: But how does `predict` know to ignore `Y` rather than `X`? – Evan Aad Nov 18 '14 at 08:50
  • 1
    When you start the regression with `lm(Y ~ X, ...)`, you are labeling `Y` as the response variable. The model retains this information, so then `predict()` knows this is the variable you are trying to predict based on the other variables (explanatory factors). – r2evans Nov 18 '14 at 17:52

1 Answers1

2

See the predict.lm function:

Y_pred = predict(m, newdata = testing)
user1808924
  • 4,563
  • 2
  • 17
  • 20