-1

I have a response variable called per-capita income. It is associated with the predictor variable, study period. Study period is a factor with 3 levels, where the first period spanned between 2008-2009, the second spanned between 2010-2012 and the third, 2013-2015.

I would like to perform a multiple linear regression in r with these 2 variables and other variables (age and gender). In short, the formula is per-capita income ~ study period + age + gender.

I performed the regression in two ways:

  1. Consider study period as nominal or non-ordered:

lm(PCI ~ factor(STUDY_PERIOD) + AGE + GENDER, data = df)

# Coefficients:
# (Intercept)  factor(STUDY_PERIOD)2  factor(STUDY_PERIOD)3   AGE      GENDERM  
# 356.07       63.15                 112.71                  -1.44     -43.73
  1. Consider study period as ordinal or ordered:
df$STUDY_PERIOD <- ordered(df$STUDY_PERIOD, levels =c(1, 2, 3))  
lm(PCI ~ STUDY_PERIOD + AGE + GENDER, data = df)
    # Coefficients:
    # (Intercept)  STUDY_PERIOD.L  STUDY_PERIOD.Q  AGE    GENDERM  
    # 414.690      79.697          -5.551          -1.440   -43.728

Both give different coefficients for the study periods.

My questions:

  1. What should I consider STUDY_PERIOD as?
  2. How do I interpret the coefficients in both cases?

Thank you!

HNSKD
  • 1,614
  • 2
  • 14
  • 25
  • Your results are the same, it's just whatever is easier for you to interpret. (Notice how the AGE and GENDERM are the same? The overall effect of the STUDY_PERIOD is the same. You don't show it, but the log likelihood, deviance, AIC, etc. are also the same. See `?contrasts` for some details, if you have more stats questions, ask at stats.stackexchange. – Gregor Thomas Dec 05 '17 at 02:30
  • Possible duplicate of [Interpretation of ordered and non-ordered factors, vs. numerical predictors in model summary](https://stackoverflow.com/questions/25735636/interpretation-of-ordered-and-non-ordered-factors-vs-numerical-predictors-in-m) – Weihuang Wong Dec 05 '17 at 03:09
  • Hi @WeihuangWong thank you for providing the link to a similar question. It is interesting to look at how we could interpret the coefficients when we treat the variable as numeric, nominal and ordinal respectively. However, I need help in deciding how I should consider the variable. Is it subjective; Is it really up to us to decide it by ourselves? – HNSKD Dec 05 '17 at 06:44

1 Answers1

1

It depends on the question you are asking. As a factor, you are asking for k-1 slope estimates, where k = number of categories. The first estimate contrasts period 1 versus period 2 while the second contrasts period 1 versus period 3.

As a linear variable (not ordinal) you are asking "as time period increases does the PCI increase/decrease. The slope here is the per period increase.

The linear is easiest to interpret, but may mask what the actual effects are. Here, though, it may be linear since the estimate for factor(STUDY_PERIOD)3 is roughly twice the estimate for factor(STUDY_PERIOD)2. A way to check is to just look at a plot.

Stephan Arndt
  • 11
  • 1
  • 1
  • Hi Stephan, thank you for your answer. By linear, you meant nominal (unordered), not numeric right? – HNSKD Dec 05 '17 at 06:47
  • Linear as in 1, 2, 3 or if you centered it, -1, 0, 1. So, numeric. Each one point increase on that scale is considered for the 79.697 slope estimate, – Stephan Arndt Dec 06 '17 at 13:00