I have a response variable called per-capita income. It is associated with the predictor variable, study period. Study period is a factor with 3 levels, where the first period spanned between 2008-2009, the second spanned between 2010-2012 and the third, 2013-2015.
I would like to perform a multiple linear regression in r with these 2 variables and other variables (age and gender). In short, the formula is per-capita income ~ study period + age + gender.
I performed the regression in two ways:
- Consider study period as nominal or non-ordered:
lm(PCI ~ factor(STUDY_PERIOD) + AGE + GENDER, data = df)
# Coefficients:
# (Intercept) factor(STUDY_PERIOD)2 factor(STUDY_PERIOD)3 AGE GENDERM
# 356.07 63.15 112.71 -1.44 -43.73
- Consider study period as ordinal or ordered:
df$STUDY_PERIOD <- ordered(df$STUDY_PERIOD, levels =c(1, 2, 3)) lm(PCI ~ STUDY_PERIOD + AGE + GENDER, data = df)
# Coefficients:
# (Intercept) STUDY_PERIOD.L STUDY_PERIOD.Q AGE GENDERM
# 414.690 79.697 -5.551 -1.440 -43.728
Both give different coefficients for the study periods.
My questions:
- What should I consider
STUDY_PERIOD
as? - How do I interpret the coefficients in both cases?
Thank you!