0

I would like to know how to properly back-transform the output from a univariate linear mixed effects model in order to interpret it. I have not posted data to go along with my question because my question should be answerable without data.

My model (simplified for the purposes of this question):

library(lme4)
m1<-lmer(activity ~ sex + BirthDate+ (1|id), data=merge.data)

> m1
Linear mixed model fit by REML ['lmerMod']
Formula: activity ~ sex + BirthDate + (1 | id)
   Data: merge.data
REML criterion at convergence: 572.0483
Random effects:
 Groups   Name        Std.Dev.
 id    (Intercept) 0.7194  
 Residual             1.4651  
    Number of obs: 150, groups:  id, 89
    Fixed Effects:
   (Intercept)            sexM       BirthDate  
      -0.08661         0.20718         0.43022  

Where:

  • activity is a continuous response variable
  • sex is a categorical variable with 2 levels (female and male)
  • BirthDate is a continuous variable; BirthDate is the number of days since January 1st and then it is mean centred and standardized to one standard deviation
  • id is a random effect for individual identity
  • merge.data is the name of my dataset

Before BirthDate is mean centred and standardized to one standard deviation:

> summary(merge.data$BirthDate)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  94.96  115.96  121.96  122.67  127.96  138.96 

After BirthDate is mean centred and standardized to one standard deviation:

merge.data<-merge.data %>%
    mutate(BirthDate = ((BirthDate-mean(BirthDate))/(1*(sd(BirthDate)))))

> summary(merge.data$BirthDate)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-3.09082 -0.74816 -0.07883  0.00000  0.59050  1.81761 

I would like to know what the mean value is for both sex and BirthDate. Based on reading The R Book by Crawley, I can get the mean from my model m1 with the following code:

tapply(predict(m1,type="response"), merge.data$sex,mean) #gives you the back-transformed mean for sex from the model "m1"

 F           M 
-0.08334649  0.11199685

Which says that the mean activity score for females is -0.083 and males is 0.11.

When I try this for BirthDate, like so:

 tapply(predict(m1,type="response"), merge.data$BirthDate,mean)

  -3.09082367412411    -1.6406056364576   -1.52905040279094 #mean centered birth date
        -0.79030344         -0.87012920         -0.44792213 #activity score

and so on...

What I end up getting is 1 mean value for every birth date (BirthDate is mean centred and standardized to one standard deviation). Unlike with sex, I can't really do anything with that information... I am trying to present the effect (effect size) of increasing birth date on activity.

What I would like to ultimately do is say that for every 1 day increase in birth date, there is an [number from model] increase in activity score.

Blundering Ecologist
  • 1,199
  • 2
  • 14
  • 38

1 Answers1

0

When you print out the model by typing m1, this part:

    Fixed Effects:
   (Intercept)            sexM       BirthDate  
      -0.08661         0.20718         0.43022  

tells you the slopes, i.e. how much the result will change based on a change in the inputs. In particular, if you increase BirthDate by one (and keep everything else the same), the predicted activity score will increase by 0.43022.

You do not provide any data so I cannot work directly with your data and your model. Instead, I will illustrate with some data built into R, the iris data.

## Build a linear model
Mod1 = lm(Petal.Length ~ ., data=iris[,1:4])

Now we could just type Mod1, but that gives more than I care to see. We can restrict our attention to the interesting part using

Mod1$coefficients
 (Intercept) Sepal.Length  Sepal.Width  Petal.Width 
  -0.2627112    0.7291384   -0.6460124    1.4467934

This gives the slope for each of the predictor variables (and the intercept). I want to illustrate how the response Petal.Length varies with the inputs. I will just take some point and change one predictor and look at the result.

NewPoint = iris[30,1:4]
NewPoint[,1] = NewPoint[,1]+1
iris[30, 1:4]
   Sepal.Length Sepal.Width Petal.Length Petal.Width
30          4.7         3.2          1.6         0.2
NewPoint
   Sepal.Length Sepal.Width Petal.Length Petal.Width
30          5.7         3.2          1.6         0.2

You can see that NewPoint is the same as the original point iris[30,1:4] except that Sepal.Length has been increased by 1. How does that affect the prediction?

predict(Mod1, newdata=iris[30,1:4])
      30 
1.386358 
predict(Mod1, newdata=NewPoint)
      30 
2.115497 
predict(Mod1, newdata=NewPoint) - predict(Mod1, newdata=iris[30,1:4])
       30 
0.7291384

The difference in the predicted values is 0.7291384, which is the coefficient for Sepal.Length shown above.

G5W
  • 36,531
  • 10
  • 47
  • 80
  • I have two follow-up questions: 1) does the factor that the model has a random effect change the interpretation of the model coefficient? 2) does the fact that `BirthDate` is standardized affect the interpretation of the model coefficient further? – Blundering Ecologist Apr 07 '19 at 14:02
  • 1
    1. No. The model provides a prediction based on the predictor variables. The random part influences the error in these predictions, but does not change the prediction. 2. Yes. The slope (rate of change) given by your model is for the variable that you built it on - the standardized dates. So the slope in the model for BirthDate `0.43022` applies to a change in _standardized_ BirthDates. To find the change in activity score for a change in the original BirthDates by one, you need to scale the result. The change will be `(1 / sd(BirthDate)) * 0.43022` – G5W Apr 07 '19 at 14:11
  • Ok. That makes sense. I don't understand why I am getting multiple outputs for the `predict(m1, data=merge.data)` line then (I had thought this was because of the random effect)? – Blundering Ecologist Apr 07 '19 at 14:51
  • (I should clarify, it's the `tapply(predict(m1,type="response"), merge.data$BirthDate,mean)` line in my OP) – Blundering Ecologist Apr 07 '19 at 15:13