2

I'm interested in predicting Y and am studying different two measurement techniques X1 and X2. It could be for instance that I want to predict the tastiness of a banana, either by measuring how long it has been lying on the table, or by measuring the number of brown spots on the banana.

I want to know which one of the measuring techniques is better, should I choose to perform only one.

I can create a linear model in R:

m1 = lm(Y ~ X1)
m2 = lm(Y ~ X2)

Now let's say X1 is a superior predictor of banana tastiness than X2. When calculating the R^2 of the two models, the R^2 of model m1 is clearly higher than model m2. Before writing a paper on how method X1 is better than X2, I want to have some sort of indication that the difference is not by chance, possibly in the form of a p-value.

How would one go about this? How to do it when I'm using different brands of bananas and move to a Linear Mixed Effect model that incoporates banana brand as a random effect?

Marijn van Vliet
  • 5,239
  • 2
  • 33
  • 45
  • I might compare [mean squared prediction errors](http://stats.stackexchange.com/q/20741/11849). – Roland Jun 21 '13 at 13:04
  • Comparing models using `anova` function: http://stats.stackexchange.com/q/53312/8464 – topchef Jun 21 '13 at 15:41
  • 2
    You can only compare models with `anova` when they're nested models, which these are not. The requested testing would probably best be done using AIC. – John Jun 21 '13 at 16:19
  • @John - absolutely right, data is different - so ANOVA doesn't apply. – topchef Jun 22 '13 at 00:04
  • Moved this question to stats.stackexchange.com where it belongs. Sorry to have wasted your time. Thanks for your comments so far. – Marijn van Vliet Jun 25 '13 at 08:38

1 Answers1

1

Sorry if didn't understand you right. So far as I understood, it's simple basic statistics question, not R.

You put them together in 1 regression. p-value for each coefficient reveals whether they are significant or not. You can also but banana brand as a dummy (if there are not too many types). And do ANOVA tests. Btw are both measurements techniques significant in separate models? What's R^2 of those models and combined model? As for your problem, look to the definition of R^2, hope that will help :)

Asayat
  • 645
  • 10
  • 23
  • 2
    No, they want to compare models. Putting both predictors in one model, won't tell you, which is the better predictor on its own. – Roland Jun 21 '13 at 13:01
  • Then if both are significant separately, compare by R^2. So you can tell that m1 explains Y as some percentage, that is higher than in m2 – Asayat Jun 21 '13 at 13:04
  • Putting both predictors in a model tells us something if only one of them will have a significant pvalue..if both are significant those with the higher coefficient will be the one with the larger effect on the dependent variable. To compare models you could use AIC, but I do not think that in this case that's necessary... – vodka Jun 21 '13 at 13:26