How to account for paired observations in statistical tests other than t-test (e.g. regression)?

Question

How to account for paired observations in statistical tests other than t-test? Below I discuss two examples in which I try to do so with a mixed-effect approach and fail.

Example 1: how to reproduce t.test(..., paired=T) in lm()?

# simulate data
set.seed(66)
x1 <- rnorm(n=100, mean=30, sd=6)
x2 <- rnorm(n=100, mean=60, sd=6)

# arrange the data in a dataset
dd <- data.frame(ID=rep(paste("ID", seq(1,100, by=1), sep="_"),2),
                        response=c(x1,x2),
                        group=c(rep("A", 100), rep("B", 100))
                        )
t.test(x1,x2, paired=F)
summary(lm(response~group, data=dd)) # same outcome

If the observations are paired one can account for it with t.test() but how to do so in lm() (if at all possible)? I tried to use a mixed-effect model approach, but:

summary(lmerTest::lmer(response~group + (1+group|ID), data=dd))

Gives an error:

Error: number of observations (=200) <= number of random effects (=200) for term (1 + group | ID);
the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable

While:

summary(lmerTest::lmer(response~group + (1|ID), data=dd))

Runs but the fixed effect parameter estimates and associated Std. Errors and t values are identical to those produced by lm().

Example 2: linear regression with paired observations

Let's imagine that the observations in the dataset I created were from subjects measured 30 days apart - namely, each of the 100 subjects were measured on day 0, then again on day 30 - and we wanted to estimate the rate of change over time:

dd$time=c(rep(0,100), rep(30, 100)) # add "time" variable to dd

Data look like this (linear regression in black, paired data linked by red lines):

lm1 <- lm(response~time, data=dd)

lm1 does not account for the paired nature of the observations. I thought of running a mixed-effect model that allowed for each pair of data to differ in intercept and slope, but R protests again that I am trying to estimate too many parameters:

lmerTest::lmer(response ~ time + (time | ID), data=dd)
# Error: number of observations (=200) <= number of random effects (=200) for term (time | ID);
# the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable

A simpler model that allows data pairs to differ in intercept but not in slope, namely:

lmer(response ~ time + (1 | ID), data=dd)

Complains that:

boundary (singular) fit: see ?isSingular

But runs and produces fixed effect estimates that are identical to those produced by lm().

[UPDATE]

@Limey reminded me that a paired t-test is nothing but a t-test assessing whether the pairwise differences between the two groups differ from zero. Such pairwise difference could be used to perform any paired statistical test beside a t test. To verify this I created three different "Response" variables that are a combination of x1 and x2 ordered in different ways (respectively: original random order; x1 in increasing and x2 in decreasing order; both in increasing order).

dd$response2 <- c(sort(x1, decreasing = FALSE), sort(x2, decreasing = T))
dd$response3 <- c(sort(x1, decreasing = FALSE), sort(x2, decreasing = F))

I calculated the corresponding differences:

dd$diff1 <- c((dd$response[1:100]-dd$response[1:100]), 
                (dd$response[101:200]-dd$response[1:100]))
dd$diff2 <- c((dd$response2[1:100]-dd$response2[1:100]), 
                (dd$response2[101:200]-dd$response2[1:100]))
dd$diff3 <- c((dd$response3[1:100]-dd$response3[1:100]), 
                (dd$response3[101:200]-dd$response3[1:100]))

And used them to perform linear models:

lm2.diff1 <- lm(diff1~time, data=dd)
lm2.diff2 <- lm(diff2 ~time, data=dd)
lm2.diff3 <- lm(diff3 ~time, data=dd)

I expected them to differ in their slope estimates, but they were all the same:

summary(lm2.diff1)$coeff[2] # 0.9993754
summary(lm2.diff2)$coeff[2] # 0.9993754
summary(lm2.diff3)$coeff[2] # 0.9993754

Their slope estimate is the same estimated from the corresponding "unpaired" linear models (lm(response~time), lm(response2~time), and lm(response3~time)). What am I missing?

https://stats.stackexchange.com/questions/23276/paired-t-test-as-a-special-case-of-linear-mixed-effect-modeling ; https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q2/006091.html — Ben Bolker, Apr 11 '21 at 21:02

Limey · Answer 1 · 2021-04-11T18:58:51.260

A paired t-test simply tests whether the mean value (of the differences between the two groups) is zero. So to "simulate" the results of a paired t-test by other means, simply pre-calculate the difference and pass that to your test of choice. Using your example:

x1 <- rnorm(n=100, mean=30, sd=6)
x2 <- rnorm(n=100, mean=60, sd=6)
diff <- x1-x2
dd <- data.frame(response=diff)
# Standard paired t-test
t.test(x1,x2, paired=T)


    Paired t-test

data:  x1 and x2
t = -36.167, df = 99, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -31.93760 -28.61546
sample estimates:
mean of the differences 
              -30.27653

Now a "simulated" t-test...

t.test(diff)

    One Sample t-test

data:  diff
t = -36.167, df = 99, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -31.93760 -28.61546
sample estimates:
mean of x 
-30.27653

And now as a linear model

summary(lm(response~1, data=dd)) 


Call:
lm(formula = response ~ 1, data = dd)

Residuals:
    Min      1Q  Median      3Q     Max 
-18.473  -7.328   0.614   6.101  20.764 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -30.2765     0.8371  -36.17   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.371 on 99 degrees of freedom

Of course! I'm learning yet again something that I used to know and managed to forget. Here's a funny thing though: if I code `dd$diff <- c((x1-x1),(x2-x1))` and test `lm(diff ~ time, data=dd)` the slope is exactly the same as the one estimated via `lm(response ~ time, data=dd)`. Given that `diff ~ time` accounts for "pairedness" while `response ~ time` doesn't (or shouldn't), I don't understand how it is possible. — Marco Plebani, Apr 11 '21 at 19:24
Question updated to include, in greater detail, the new can of worms mentioned above. — Marco Plebani, Apr 11 '21 at 20:26

Ben Bolker · Accepted Answer · 2021-04-12T12:42:58.397

Good question! There are a couple of tricky bits here.

We can start by calculating the paired test via t.test() and by hand:

pairedtest1 <- t.test(x1,x2, paired=TRUE)
d <- x1-x2
n <- length(x1)
tstat <- mean(d)/(sd(d)/sqrt(n))                    ## -37.58846
pval <- 2*pt(abs(tstat), lower.tail=FALSE, df=n-1)  ## 2.065802e-60
all.equal(pairedtest1$p.value,pval) ## TRUE
all.equal(unname(pairedtest1$statistic),tstat) ## TRUE

As you found out, the model with a random intercept and among-group variation in treatment is overfitted when used in a linear mixed model when there is only one observation of each treatment per group; a similar example is given here. You can force lmer to fit it if you want:

m0 <- lme4::lmer(response~group+(group|ID), data=dd, REML=TRUE,
                 control=lmerControl(check.nobs.vs.nRE="ignore", calc.deriv=FALSE))

(Note that we can also see if two models are giving equivalent fits by comparing the log-likelihoods or REML criteria - when we have overfitted models like this one, different models can get equivalent fits with different linear combinations of model parameters ...)

If we run

library(lmerTest)
coef(summary(as(m1,"lmerModLmerTest"),ddf="Kenward-Roger"))["groupB",]

(the default Satterthwaite ddf calculation fails here) we get

    Estimate   Std. Error           df      t value     Pr(>|t|) 
2.998126e+01 7.976192e-01 9.900000e+01 3.758844e+01 2.065922e-60

the t-statistic and p-value very closely match the results above (I could just have said summary() but wanted to pull out this specific row of the coefficient table).

Now let's try the reduced model

m1 <- lme4::lmer(response~group+(1|ID), data=dd, REML=TRUE)

As you noted, the fit is singular (if you check, the RE variance will be listed as 0). Here the t-statistic and p-value are a little off (at this moment, I'm not quite sure why the previous model worked!). The reason is that, for this data set, the within-group variance is slightly greater than the between-group variance. In general we expect var(between) = sigma^2_between + sigma_2_within/n, and this works asymptotically/in expectation, but in small data sets the ordering can be in the direction we observe here, in which case we would need a negative variance in order to perfectly fit the data.

resids <- with(dd,response-ave(response,group, FUN=mean))
wv <- var(resids-ave(resids, dd$ID, FUN=mean))    ## 15.82
bv <- var(tapply(resids, list(dd$ID), FUN=mean))  ## 14.92

(If we fit the same model with lme it appears OK and gives us a small [but non-zero] estimate for the among-group intercept variance, but if we try intervals(m2) it complains that the approximate var-cov matrix is not positive definite ...)

As Gang Chen pointed out in a 2011 post to r-sig-mixed-models, we can fit the model we want by allowing a negative correlation between the two observations within a group (the additive model shown above only allows non-negative correlations):

library(nlme)
m2 <- lme(response~group,random=list(ID=pdCompSymm(form=~group-1)),
          data=dd,method="REML")
all.equal(tstat^2, anova(m2)[["F-value"]][2]) ## TRUE
all.equal(pval, anova(m2)[["p-value"]][2])    ## TRUE

The p-value from anova() matches our result above, and the F-statistic matches the square of our t-statistic.

We can also do this in glmmTMB: we have to be a little bit careful because the default cs() covariance structure in glmmTMB allows different variances for each term, while corCompSymm imposes a homogeneous variance:

m3 <- glmmTMB::glmmTMB(response~group+cs(group-1|ID),
                       data=dd, REML=TRUE)
m4 <- update(m3,  map=list(theta=factor(c(1,1,2))))

(the map argument sets the first two elements of the random-effects parameter vector, which correspond to the log-sd of the variation in the first group and the second group, to be identical)

The coefficient table gets the t-statistic (which it calls a z value) correct, but doesn't have a concept of "denominator degrees of freedom", so does a Z-test rather than a t-test ...

coef(summary(m4))$cond["groupB",]
            Estimate Std. Error  z value Pr(>|z|)
groupB      29.98126  0.7976217 37.58832        0

Many thanks. I've been using MEM a lot lately, which sent me in the beautiful rabbit hole that is understanding how they actually work instead of only using them as a black box. I should look into the difference between ML and REML again, as well as understanding the nitty-gritty of var-covar matrices instead of just staring at them blankly like I would with Russian names in a Tolstoj novel (i.e. I know more or less what they stand for but if you asked me to explain it to others I wouldn't be able to). By the way, many thanks for your relentless online support to MEM users over the years. — Marco Plebani, Apr 12 '21 at 08:47

Marco Plebani · Answer 3 · 2021-04-11T22:06:57.873

Their slope estimate is the same estimated from the corresponding "unpaired" linear models (lm(response~time), lm(response2~time), and lm(response3~time)). What am I missing?

It makes sense that the overall slope is the same in the three models, being the mean of all the pairwise slopes (something that would be easy to confirm). What I was missing is that, in the case of lm2.diff3, the StdErr around the slope estimate is about one order of magnitude smaller than in lm2.diff1 and lm2.diff2. lm2.diff3 is performed on Response3 that is a lot more uniform in the behaviour of each pair of observations compared to Response1 and Response2, hence the smaller StdErr around its slope estimate.

How to account for paired observations in statistical tests other than t-test (e.g. regression)?

3 Answers3