0

This is my data frame (please copy and paste to reproduce):

Control <- replicate(2, c("112", "113", "116", "118", "127", "131", "134", "135", "136", "138", "143", "148", "149", "152", "153", "155", "162", "163"))
EPD <- replicate(2, c("101", "102", "103", "104", "105", "106", "107", "108", "109", "110", "114", "115", "117", "119", "120", "122", "124", "125", "126", "128", "130", "133", "137", "139", "140", "141", "142", "144", "145", "147"))
Subject <- c(Control, EPD)
Control_FA_L <- c(0.43, 0.39, 0.38, 0.58, 0.37, 0.5, 0.35, 0.36, 0.72, 0.38, 0.45, 0.30, 0.47, 0.30, 0.67, 0.34, 0.42, 0.29)
Control_FA_R <- c(0.36, 0.49, 0.55, 0.59, 0.33, 0.41, 0.32, 0.50, 0.59, 0.52, 0.32, 0.40, 0.49, 0.33, 0.46, 0.39, 0.37, 0.33)
EPD_FA_L <- c(0.25, 0.39, 0.36, 0.42, 0.21, 0.40, 0.43, 0.16, 0.31, 0.41, 0.39, 0.40, 0.35, 0.29, 0.31, 0.24, 0.39, 0.36, 0.54, 0.38, 0.34, 0.28, 0.42, 0.33, 0.40, 0.36, 0.42, 0.28, 0.40, 0.41)
EPD_FA_R <- c(0.26, 0.36, 0.36, 0.61, 0.22, 0.33, 0.36, 0.34, 0.35, 0.37, 0.39, 0.45, 0.30, 0.31, 0.50, 0.31, 0.29, 0.43, 0.41, 0.21, 0.38, 0.28, 0.66, 0.33, 0.50, 0.27, 0.46, 0.37, 0.26, 0.39)
FA <- c(Control_FA_L, Control_FA_R, EPD_FA_L, EPD_FA_R)
Control_Volume_L <- c(99, 119, 119, 146, 127, 96, 100, 132, 103, 103, 107, 142, 140, 134, 117, 117, 133, 143)
Control_Volume_R <- c(93, 123, 114, 152, 122, 105, 98, 138, 111, 110, 115, 137, 142, 140, 124, 102, 153, 143)
EPD_Volume_L <- c(132, 115, 140, 102, 130, 131, 110, 124, 102, 111, 93, 92, 94, 104, 92, 115, 144, 118, 104, 132, 90, 102, 94, 112, 106, 105, 79, 114, 104, 108)
EPD_Volume_R <- c(136, 116, 143, 105, 136, 137, 103, 121, 105, 115, 97, 97, 93, 108, 91, 117, 147, 111, 97, 129, 85, 107, 91, 116, 113, 101, 75, 108, 95, 98)
Volume <- c(Control_Volume_L, Control_Volume_R, EPD_Volume_L, EPD_Volume_R)
Group <- c(replicate(36, "Control"), replicate(60, "Patient"))

data <- data.frame(Subject, FA, Volume, Group) 

I then run a linear mixed model for FA values with the nlme package:

library(nlme)
lmm <- lme(FA ~ Volume + Group, ~ 1|Subject, data = data)
summary(lmm)

I would now like to determine the 95% Confidence Interval for the model estimated difference in FA between the two levels of the "Group" factor (Control & Patient). I would normally proceed by executing the following code:

# Compute 95% Confidence Interval for Group factor

# True difference in STN FA between Control and EPD subjects
0.0857851 # Value from mixed model

# Multiply 97.5 percentile point of normal distribution by std error from mixed model
1.96 * 0.02555076 # 95% CI:  0.086 ± 0.050 mm^3 (p = .0016) - !!CI includes values > 1!!

I am having a hard time interpreting what this means. The confidence interval I have calculated includes values greater than 1 which doesn't make sense since FA is supposed to be a ratio value from 0 to 1. Is the fact that my dependent variable is a ratio value the issue? If so would I need to transform my data in some way (i.e. log transform) to correct this? Any feedback would be greatly appreciated!

  • If you build a statistical model with a link and error structure that allows nonsensical results, then you have "broken statistics" and you get to keep all the parts. – IRTFM Feb 05 '19 at 01:50
  • After thinking about it some more, I've expanded my answer to show you how to fit a beta mixed effect model and compare the estimated marginal means; please take a look below. – Maurits Evers Feb 05 '19 at 05:39

1 Answers1

0

As pointed out by @42-, the issue here is with the model itself. Since FD is restricted to [0, 1] we can't use lme which assumes normal errors.

Model definition

I don't know any details about your data/experiment but perhaps a Beta model may work; specifically we could use a varying-intercept hierarchical model of the form

enter image description here

where we connect the mean of FD μ to a Subject-specific intercept and the predictors Volume and Group through a logit-link.

Implementation

The library glmmTMB allows to implement such a mixed-effect model

library(glmmTMB)
lmm <- glmmTMB(
    FA ~ Volume + Group + (1 | Subject),
    data = data,
    family = "beta_family")
summary(lmm)$coef$cond
#                 Estimate  Std. Error   z value     Pr(>|z|)
#(Intercept)   0.502858259 0.348506927  1.442893 0.1490505719
#Volume       -0.006464251 0.002782781 -2.322947 0.0201820253
#GroupPatient -0.369273205 0.104832100 -3.522520 0.0004274642

Some comments on the estimates

Note that estimates are given on the logit (log odds) scale; the estimate for Group = Control is then 0.503 - 0.369 * 0 = 0.503, and for Group = Patient it is 0.503 - 0.369 * 1 = 0.134. The difference between Group = Patient and Group = Control (again on the logit scale) is simply the coefficient for GroupPatient which is -0.369.

Comparison of marginal means

I then recommend using emmeans for any follow-up analyses; in this case we can use emmeans::pairs to compare the estimated marginal means (EMMs) for the two Group levels

library(emmeans)
confint(pairs(emmeans(lmm, "Group")))
# contrast           estimate        SE df  lower.CL  upper.CL
# Control - Patient 0.3692732 0.1048321 91 0.1610371 0.5775093
# 
#Results are given on the log odds ratio (not the response) scale.
#Confidence level used: 0.95

Note that results are given on the logit scale (and not on the response scale). To get a ratio of FD responses for Group = Patient and Group = Control you need to manually convert these estimates.

Explanation: Here emmeans returns the EMMs for Group, and pairs performs a pairwise comparison of the different levels of Group. We then use confint to return (default 95%) confidence intervals.

The nice thing is that you wouldn't have to change anything if you have >2 levels for Group; pairs would perform pairwise comparisons, and automatically correct p-values for multiple hypothesis testing.

For more reading, take a look at the excellent vignette Comparisons and contrasts in emmeans.


You can also get estimated marginal means and confidence intervals on the odds ratio scale (which avoids having to manually transform from the logit scale to the odds ratio scale)

confint(pairs(emmeans(lmm, "Group"), type = "response"))
# contrast          odds.ratio        SE df lower.CL upper.CL
# Control / Patient   1.446683 0.1516588 91 1.174729 1.781595

Confidence level used: 0.95
Intervals are back-transformed from the log odds ratio scale
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • 1
    I don't think this answers the question of what to do when your model predicts an impossibility. Using lme with a default link is not appropriate for predicted values that are restricted to 0-1 – IRTFM Feb 05 '19 at 02:27
  • @42- Hmm yes, you're right. I missed that bit. The issue here seems to be the model itself; there are not enough details about the data but a binomial/logistic model might be better. Or perhaps model `FD ~ Beta(alpha, beta)`? I don't think `lme` allows for models with non-normal errors, a way forward may be a mixed-effect model in Stan/rstan? – Maurits Evers Feb 05 '19 at 02:36
  • @JakeNiederer Let me know if you need more details; happy to expand. – Maurits Evers Feb 05 '19 at 04:38
  • @MauritsEvers Thank you so much, this is all very helpful! I am running a summary on the beta mixed model created with 'glmmTMB' and am having a hard time interpreting what the model estimated difference in group would be? If I am reading the summary correctly it seems that -0.369273 is the estimate given for 'Group'. This seems very large considering that mean 'FA' for the Control group was 0.43 compared to mean 'FA' for the Patient group of 0.37 – Jake Niederer Feb 06 '19 at 00:29
  • @MauritsEvers Furthermore, when running 'lmm <- glmmTMB(FA ~ Volume + Group + (1 | Subject), data = data, family = "beta_family")' I am receiving the following warning messages without much knowledge of what they are referring to: "1: In f(par, order = order, ...) : value out of range in 'lgamma' 2: In f(par, order = order, ...) : value out of range in 'lgamma'" – Jake Niederer Feb 06 '19 at 00:31
  • Hi @JakeNiederer; concerning the estimates keep in mind that we are on the logit (log odds) scale (note that I made a mistake in my model in that `glmmTMB` by default uses a `logit` link function and not a `log` link, I've fixed this now). You'll need to convert back to the response scale (as indicated by the output of `emmeans::pairs`). I'll add an example in my answer once I get the chance. – Maurits Evers Feb 06 '19 at 05:26
  • @JakeNiederer [continued] I've added a comment on the scale of estimates after the model fit. Concerning the warnings you are getting: I can't really say too much about this other than what the [glmmTMB troubleshooting vignette](https://cran.r-project.org/web/packages/glmmTMB/vignettes/troubleshooting.html) already says: *"[These] warnings indicate possibly-transient numerical problems with the fit, and can be treated in the same way (i.e. ignored if there are no errors or convergence warnings about the final fitted model)."* – Maurits Evers Feb 06 '19 at 05:57
  • @JakeNiederer [continued] Since there are no convergence warnings and the fit results look reasonable I'd say these warnings can be ignored in your case. – Maurits Evers Feb 06 '19 at 05:57
  • @MauritsEvers Fantastic, thank you! Are there any assumptions the beta model makes that I should be aware of? For example fitting a linear mixed model with lme we assume heteroscesdasticity and normal errors for which I would usually check using 'plot(lmm)' and 'qqline(resid(lmm))'. I'm just wondering what assumptions I will need to check for the beta model. – Jake Niederer Feb 06 '19 at 16:53
  • @MauritsEvers Sorry for all of the additional questions here. I'm also a bit hung up on exactly how to manually convert the logit scale responses back to FA responses for Group = Patient and Group = Control – Jake Niederer Feb 06 '19 at 17:27
  • @JakeNiederer This comment section is getting a bit unwieldy, I recommend taking a look at some of the publicly available tutorials/introductions to beta regression models; the fundamentals for beta models apply here too. See e.g. [here](https://cran.r-project.org/web/packages/betareg/vignettes/betareg.pdf). For example, beta regression models are naturally heteroskedastic, as you can see from my model definition. Concerning the inverse logit transformation, the same rules apply as for e.g. standard logistic regression with a logit link. – Maurits Evers Feb 07 '19 at 03:49
  • @JakeNiederer [continued] You can also use `pairs(..., type = "response")` to transform differences from the logit scale to the odds ratio scale. I've added an example at the bottom of my post. – Maurits Evers Feb 07 '19 at 03:50