14

I have fitted a model where:

Y ~ A + A^2 + B + mixed.effect(C)

Y is continuous A is continuous B actually refers to a DAY and currently looks like this:

Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 11 < 12

I can easily change the data type, but I'm not sure whether it is more appropriate to treat B as numeric, a factor, or as an ordered factor. AND when treated as numeric or ordered factor, I'm not quite sure how to interpret the output.

When treated as an ordered factor, summary(my.model) outputs something like this:

Linear mixed model fit by REML ['lmerMod']
Formula: Y ~ A + I(A^2) + B +  (1 | mixed.effect.C)
Fixed effects:
                       Estimate Std. Error t value
(Intercept)              19.04821    0.40926   46.54
A                      -151.01643    7.19035  -21.00
I(A^2)                  457.19856   31.77830   14.39
B.L                      -3.00811    0.29688  -10.13
B.Q                      -0.12105    0.24561   -0.49
B.C                       0.35457    0.24650    1.44
B^4                       0.09743    0.24111    0.40
B^5                      -0.08119    0.22810   -0.36
B^6                       0.19640    0.22377    0.88
B^7                       0.02043    0.21016    0.10
B^8                      -0.48931    0.20232   -2.42
B^9                      -0.43027    0.17798   -2.42
B^10                     -0.13234    0.15379   -0.86

What are L, Q, and C? I need to know the effect of each additional day (B) on the response (Y). How do I get this information from the output?

When I treat B as.numeric, I get something like this as output:

    Fixed effects:
                       Estimate  Std. Error t value
(Intercept)            20.79679    0.39906   52.11
A                    -152.29941    7.17939  -21.21
I(A^2)                461.89157   31.79899   14.53
B                      -0.27321    0.02391  -11.42

To get the effect of each additional day (B) on the response (Y), am I supposed to multiply the coefficient of B times B (the day number)? Not sure what to do with this output...

kdarras
  • 389
  • 1
  • 5
  • 16
  • 4
    Those are orthogonal polynomial contrasts. Most people will not want to use ordered factors, especially if they do not already understand these terms. AND if you make quadratic models for inference, please learn to use `poly()` rather than `I()`. – IRTFM Sep 09 '14 at 02:22

1 Answers1

45

This is not really a mixed-model specific question, but rather a general question about model parameterization in R.

Let's try a simple example.

set.seed(101)
d <- data.frame(x=sample(1:4,size=30,replace=TRUE))
d$y <- rnorm(30,1+2*d$x,sd=0.01)

x as numeric

This just does a linear regression: the x parameter denotes the change in y per unit of change in x; the intercept specifies the expected value of y at x=0.

coef(lm(y~x,d))
## (Intercept)           x 
##   0.9973078   2.0001922 

x as (unordered/regular) factor

coef(lm(y~factor(x),d))
## (Intercept)  factor(x)2  factor(x)3  factor(x)4 
##    3.001627    1.991260    3.995619    5.999098 

The intercept specifies the expected value of y in the baseline level of the factor (x=1); the other parameters specify the difference between the expected value of y when x takes on other values.

x as ordered factor

coef(lm(y~ordered(x),d))
##  (Intercept) ordered(x).L ordered(x).Q ordered(x).C 
##  5.998121421  4.472505514  0.006109021 -0.003125958 

Now the intercept specifies the value of y at the mean factor level (halfway between 2 and 3); the L (linear) parameter gives a measure of the linear trend (not quite sure I can explain the particular value ...), Q and C specify quadratic and cubic terms (which are close to zero in this case because the pattern is linear); if there were more levels the higher-order contrasts would be numbered 5, 6, ...

successive-differences contrasts

coef(lm(y~factor(x),d,contrasts=list(`factor(x)`=MASS::contr.sdif)))
##  (Intercept) factor(x)2-1 factor(x)3-2 factor(x)4-3 
##     5.998121     1.991260     2.004359     2.003478 

This contrast specifies the parameters as the differences between successive levels, which are all a constant value of (approximately) 2.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Hi Ben, Thanks for the quick, constructive response. I totally understand how to interpret the coefficients for regular factors, and the slope given a single numeric explanatory variable. So, if I were to have multiple explanatory variables, all numeric, how would that work? For a single value of Y, am I simply multiplying the coefficient for A by each value of A and the coefficient of B by each value of B? For approximately 800 data points, "B" can be any number between 1 and 12 and "A" varies continuously between 0.0100 and 0.200. – RedPandaSpaceOdyssey Sep 09 '14 at 04:43
  • 1
    Hello. I am having a little trouble understanding the "x as ordered factor" part of your answer, so I asked a question about it. Would you mind taking a look please? https://stackoverflow.com/questions/49722951/results-of-lm-function-with-a-dependent-ordered-categorical-variable – Ovi Apr 08 '18 at 21:34