0

My linear model would be Score ~ Age + Collection1 + Collection3

I transform the Collection Column into dummy variables and I don't have Collection5 column to prevent the dummy variable trap.

For Collection 1, 3, and 5, I am sampling the same people and the Collection period takes place at the same time (Collection 1 = year 2000, Collection 3 = year 2002, Collection 5 = year 2004) hence the +2 in age for the same contacts.

Would the variables Age, Collection1, and Collection3 be multicollinear? On one hand, I feel like the increase in age is correlated with a higher collection but since Collection is transformed into multiple dummy variables of 1s and 0s, it shouldn't matter.

Is there a logical explanation as to why it should or shouldn't be multicollinear or break other assumptions?

Example Data

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 2
    Can you please post your example data in text format rather than as an image? – Ben Bolker Mar 19 '22 at 18:06
  • 2
    This sounds like a statistics question and not a coding question, and thus might be better for https://math.stackexchange.com/. However, the typical way of checking for multicollinearity is to run the VIF test. You can do that in R with for example `car::vif()`, se more [here](https://www.statology.org/multicollinearity-regression/). – jpiversen Mar 19 '22 at 18:19

0 Answers0