lme4 calculate confidence intervals of covariances

Question

Please see answer from Ben Bolker 16/05/2016 for the appropriate solution. OP below.

I am fitting several multilevel models with lme4. I would like to report the variances and covariances of the random effects, and to automate this process.

I know I can get the variances with as.data.frame(VarCorr(mymodel)), and I know that I can get the confidence intervals with confint(mymodel). Clearly I can merge/rbind the two tables and put the confidence intervals around the variances by simply squaring the output of confint() at the appropriate rows and columns, but I have not been able to find a convincing method to calculate the covariances if not by hand.

Say the result of confint is:

conf <- NULL
a <- c(6.2,-0.4,2.2,1.5,-0.4,-0.5,2.8,-0.9,1.3,3.9)
b <- c(6.8,-0.2,2.5,2.5,0.1,0.2,4.8,-0.7,2.3,5)
conf <- data.frame(a,b,row.names = c("sd_(Intercept)|ID","cor_Time.(Intercept)|ID","sd_Time|ID","sd_(Intercept)|Group","cor_Time.(Intercept)|Group","cor_I(Time^2).(Intercept)|Group","sd_Time|Group","cor_I(Time^2).Time|Group","sd_I(Time^2)|Group","sigma"))
colnames(conf) <- c("2.5%","97.5%")
conf

How can I automate the various multiplications to get the covariances such as

cov.time.intercept <- conf[1,2]*conf[1,1]*conf[1,3]

?

I have tried splitting standard deviations and correlations, creating "ID", "Time", "I(Time^2)" and "(Intercept)" variables and then matching by two columns, but I am not getting anywhere. The issue is that each time the model changes you might have different numbers of variances and covariances, and different triangular matrices.

Thank you for any help,

k.

can you state a little more clearly what you're trying to do? Do you want confidence intervals on the variances and covariances? Or do you want variances and covariances of the variances and covariances themselves? As @Thierry suggests below, I think there are some issues you need to sort out/clarify *before* trying to find the right computational framework. — Ben Bolker, May 13 '16 at 12:29
Hi Ben, thank you for your reply and apologies if that was not clear. Indeed, I have would like to express confidence intervals as variances and covariances, not as standard deviations and correlations as the default of `lme4::confint()` is. — r.kaiza, May 13 '16 at 13:07
Just a quick note that getting confidence intervals of random effects on the variance-covariance scale is indeed not trivial; I'm working on some mods to the package that should help with that. — Ben Bolker, May 13 '16 at 18:19
Hi Ben, I don't understand, isn't it just a matter of taking the standardised results and squaring/multiplying by the SDs? Best wishes — r.kaiza, May 14 '16 at 20:05
not if you want profile confidence intervals (still working on it). — Ben Bolker, May 14 '16 at 20:08
by the way, I think your title would make sense as "confidence intervals of covariances" rather than the way you have it ... — Ben Bolker, May 15 '16 at 20:41

score 2 · Answer 1 · answered May 13 '16 at 14:34

Solved, thank you for contributing. I will update the initial post. The outcome can be tested with the dataset from Snijders & Bosker available here.

Import with

library(foreign)
chap12 <- read.dta(file = "<your path>/ch12.dta")

An improvised model:

snijders <- lmer(prox_pup ~ 1 + prox_sel + (1 + occ|teacher), data = chap12)

Source the function:

ExtractVarCovCI <- function(Model) {

v <- NULL
v <- as.data.frame(VarCorr(Model),order = "lower.tri") #Extract variances and covariances

conf <- confint(Model, parm  ="theta_", oldNames = F) #extract CIs

v.conf <- cbind(v,conf) #bind confidence intervals

covs <- as.data.frame(v.conf[!is.na(v[,3]),]) #separate variance from covariance components
vars <- as.data.frame(v.conf[is.na(v[,3]),]) #separate variance from covariance components
vars.sq <- vars[,6:7]^2 #calculate square of variance components
colnames(vars.sq) <- sub("[%]", "% sq.", colnames(vars.sq))

vars2 <- cbind(vars,vars.sq) #bind squares of variance components
covs$`2.5 % sq.` <- c(rep(NA,nrow(covs))) #create empty columns for later
covs$`97.5 % sq.` <- c(rep(NA,nrow(covs))) #create empty columns for later

lcovs <- length(row.names(covs)) #now we re-organise the table so that each covariance is below the variance of its variables
k <- NULL
for (i in seq(1:lcovs)) {
  k <- rbind(k,vars2[vars2$grp %in% covs[i,1] & vars2$var1 %in% covs[i,2],],vars2[vars2$grp %in% covs[i,1] & vars2$var1 %in% covs[i,3],],covs[i,])
}

k2 <- rbind(k,vars2["sigma",]) #bind the level-1 residuals at the end

k2.covrow <- grep("^cor",rownames(k2)) # isolate covariance row position
k2[k2.covrow,8] <- k2[k2.covrow,6]*k2[k2.covrow-1,6]*k2[k2.covrow-2,6] #calculate covariance 2.5%
k2[k2.covrow,9] <- k2[k2.covrow,7]*k2[k2.covrow-1,7]*k2[k2.covrow-2,7] #calculate covariance 97.5%

p <- NULL
p <- k2[,c(4,8:9)] #retain only the estimates and the confidence intervals
rownames(p) <- sub("^sd","var",rownames(p)) #now it's clear that we have proper variances and covariances
rownames(p) <- sub("^cor","cov",rownames(p)) #now it's clear that we have proper variances and covariances
colnames(p) <- c("Estimate", "2.5%", "97.5%")

return(p)
}

Run the function:

ExtractVarCovCI(snijders)

My output was:

                               Estimate         2.5%       97.5%
var_(Intercept)|teacher      0.15617962  0.089020350  0.26130969
var_occ|teacher              0.01205317  0.002467408  0.02779329
cov_occ.(Intercept)|teacher -0.03883458 -0.014820577 -0.05887660
sigma                        0.04979762  0.034631759  0.07263837

Now we have a variance-covariance table that uses non-standardized random effects with their upper and lower confidence boundaries. I am sure there are better way of doing this, but it's a start...

k.

score 2 · Accepted Answer · edited Jul 05 '18 at 16:47

Your computation seems to give plausible answers, but it doesn't make sense (to me; I stand ready to be corrected/enlightened ...). Suppose cov = corr*var1*var2. Suppose that ci(.) is the (lower or upper) confidence limit for a quantity. It is by no means true that ci(cov) = ci(corr)*ci(var1)*ci(var2) (it is interesting that you get reasonable answers; I think this is most likely to happen when the quantities are approximately uncorrelated ...) If you had the variances of each component and the covariances among them (I don't mean the random-effect variances and covariances themselves, but their sampling variances/covariances) you could propagate them approximately using the delta method, but these are fairly hard to get (see here).

The "right" way to do this, as far as I can tell, is do the likelihood profile computations on the variance-covariance scale instead of the standard deviation-correlation scale. This wasn't previously possible, but it is now (with the development version on Github).

Install latest version:

library(remotes) ## for install_github (or library(devtools))
install_github("lme4/lme4")

Preliminaries:

chap12 <- foreign::read.dta(file = "ch12.dta")
library(lme4)
snijders <- lmer(prox_pup ~ 1 + prox_sel + (1 + occ|teacher),
                 data = chap12)

as.data.frame(VarCorr(snijders))
##        grp        var1 var2        vcov      sdcor
## 1  teacher (Intercept) <NA>  0.15617962  0.3951957
## 2  teacher         occ <NA>  0.01205317  0.1097869
## 3  teacher (Intercept)  occ -0.03883458 -0.8950676
## 4 Residual        <NA> <NA>  0.04979762  0.2231538

We have to be a bit careful when comparing results because profile.merMod, which we'll use shortly, automatically (and silently!) converts fits from the default REML to maximum likelihood fits (because a profile based on REML might be statistically dicey); however, it doesn't look like this makes a huge difference.

s2 <- refitML(snijders)
as.data.frame(VarCorr(s2))
##        grp        var1 var2        vcov      sdcor
## 1  teacher (Intercept) <NA>  0.15426049  0.3927601
## 2  teacher         occ <NA>  0.01202631  0.1096645
## 3  teacher (Intercept)  occ -0.03884427 -0.9018483
## 4 Residual        <NA> <NA>  0.04955549  0.2226106

p.sd <- profile(s2,which="theta_",
              signames=FALSE)
p.vcov <- profile(s2,which="theta_",prof.scale="varcov",
              signames=FALSE)

We get some warnings about non-monotonic profiles ...

confint(p.vcov)
##                                    2.5 %     97.5 %
## var_(Intercept)|teacher      0.08888931  0.26131067
## cov_occ.(Intercept)|teacher -0.07553263 -0.01589043
## var_occ|teacher              0.00000000  0.02783863
## sigma                        0.03463184  0.07258777

What if we check against the square of the relevant (sd/variance) elements?

confint(p.sd)[c(1,3,4),]^2
##                              2.5 %     97.5 %
## sd_(Intercept)|teacher 0.089089363 0.26130970
## sd_occ|teacher         0.002467408 0.02779329
## sigma                  0.034631759 0.07263869

These match pretty well, except for the lower bound of the occ variance; they also match your results above. However, the covariance result (which is the one that I claim is difficult) gives (-0.0755,-0.0159) for me, vs. (-0.0588,-0.0148) for you, about a 20% difference. This might not be a big deal, depending on what you're trying to do.

Let's try brute force too:

sumfun <- function(x) {
    vv <- as.data.frame(VarCorr(x),order="lower.tri")[,"vcov"]
    ## cheating a bit here, using internal lme4 naming functions ...
    return(setNames(vv,
       c(lme4:::tnames(x,old=FALSE,prefix=c("var","cov")),
         "sigmasq")))
}

cc <- confint(s2,method="boot",nsim=1000,FUN=sumfun,seed=101,
        .progress="txt", PBargs=list(style=3))
## .progress/PBargs just cosmetic ...

##                                    2.5 %      97.5 %
## var_(Intercept)|teacher      0.079429623  0.24053633
## cov_occ.(Intercept)|teacher -0.067063911 -0.01479572
## var_occ|teacher              0.002733402  0.02378310
## sigmasq                      0.031952508  0.06736664

The "gold standard" here seems to be partway between my profile result and your result: lower bound on covariance is -0.067 here vs. -0.0755 (profile) or -0.0588.

Thank you for your insightful and informative answer, Ben. for the sake of comparison, with bootstrapping (nsim = 50) I get (-0.014,-0.042), but if your method is the more theoretically sound, then I will certainly use that and put a sticky note in my OP. Many thanks to the person that made this possible in the source code of the package, too. k. — r.kaiza, May 17 '16 at 09:09
Ben, I have tried your input after updating to lme4 1.1-13 but I receive the following error: `s2 <- refitML(snijders) Error in assign(field, forceCopy(current), envir = vEnv) : could not find function "forceCopy"` Am I missing a package? Thanks, k. — r.kaiza, May 17 '16 at 13:05
Can I provide any information to help you track where it got stuck? — r.kaiza, May 17 '16 at 13:11
My bad, I first refitted the model using the git version of the package and it works. Sorry. — r.kaiza, May 17 '16 at 13:13
hmm, works for me on fresh install of lme4 1.1-13 from Github on MacOS — Ben Bolker, May 17 '16 at 13:14
Yes, I had to re-fit the _snijders_ model first and then it worked, I guess the function is introduced in the lmer call? — r.kaiza, May 17 '16 at 13:16

score 1 · Answer 3 · answered May 13 '16 at 12:20

1

Please note that the standard deviation of the random effects in the lme4 summary is NOT the standard error of the variance! It's just the square root of the variance!

If you need confidence intervals on the variance of the random effects, then you'll need to profile() the likelihood. See ?lme4::profile.

answered May 13 '16 at 12:20

Thierry

18,049
5
48
66

Hi Thierry, thank you for your reply. I am aware that the standard deviation of the random effects is not the standard error of the variance, but the square root. This is why I am calculating the confidence intervals with `confint()`. However, the confidence intervals report the square roots of the random effects. Squaring them makes sense for the variances, but for the covariances you have to use the standard deviations, and I am trying to automate this process. – r.kaiza May 13 '16 at 13:05
this seems like a comment/not what the OP wanted (now that they have clarified) ... – Ben Bolker May 15 '16 at 22:21

lme4 calculate confidence intervals of covariances

3 Answers3