Perform hypothesis testing on coefficients from separate linear models

Question

Imagine I have two separate lm objects

data(mtcars)

lm1 <- lm(mpg ~ wt, data = mtcars)
lm2 <- lm(mpg ~ wt + disp, data = mtcars)

In this case I'd like to compare both wt coefficients, and perform a hypothesis test on the null that the coefficients in both models are equal(for technical reason I need to actually have two models, rather than just including an interaction)

Your question seems to be incomplete. Do you mean something like a likelihood ratio test? E.g. `anova(lm1, lm2, test = "LRT")`? — Maurits Evers, Mar 09 '18 at 04:12
No, I want to do a direct test on the coefficient itself, as the package `multcomp` lets you do for coefficients within a single model — tomw, Mar 09 '18 at 04:13
So why not use the full model and then use `multcomp` to sequentially test for the significance of all predictor variables? Isn't that exactly the purpose of `multicomp`? — Maurits Evers, Mar 09 '18 at 04:17
PS. Your question is still incomplete: *"and perform a hypothesis test on the null that the coefficients in both models (...) ..."*. — Maurits Evers, Mar 09 '18 at 04:18

score 2 · Accepted Answer · answered Mar 09 '18 at 07:15

Since you want to perform a hypothesis test on the estimates, I suggest a fully Bayesian model, which will get you the full posterior distribution of every variable.

rstanarm is based on Stan, and offers convenient functions that mimic the usual lm, glm syntax; if you want to know more about Stan/RStan, see here.

Based on the posterior distributions of every variable, we can then perform e.g. a t test and Kolmogorov-Smirnov test to compare the full posterior densities for every variable.

Here is what you could do:

Perform the model fits.

library(rstanarm);
lm1 <- stan_lm(mpg ~ wt, data = mtcars, prior = NULL);
lm2 <- stan_lm(mpg ~ wt + disp, data = mtcars, prior = NULL);

Note how easy it is to run a fully Bayesian linear model with rstanarm.

Extract the posterior densities for all shared coefficients (in this case, the (Intercept) and wt).

library(tidyverse);
shared.coef <- intersect(names(coef(lm1)), names(coef(lm2)));
shared.coef;
#[1] "(Intercept)" "wt"
df1 <- lm1 %>%
    as.data.frame() %>%
    select(one_of(shared.coef)) %>%
    mutate(model = "lm1");
df2 <- lm2 %>%
    as.data.frame() %>%
    select(one_of(shared.coef)) %>%
    mutate(model = "lm2");

The posterior densities for 4000 MCMC draws are stored in two data.frames.

We plot the posterior densities.

# Plot posterior densities for all common parameters
bind_rows(df1, df2) %>%
    gather(var, value, 1:length(shared.coef)) %>%
    ggplot(aes(value, colour = model)) +
        geom_density() +
        facet_wrap(~ var, scale = "free");

We compare the posterior density distributions of every shared parameter in a t test and a KS test. Here I'm using library broom to tidy-up the output.

# Perform t test and KS test
library(broom);
res <- lapply(1:length(shared.coef), function(i)
    list(t.test(df1[, i], df2[, i]), ks.test(df1[, i], df2[, i])));
names(res) <- shared.coef;
lapply(res, function(x) bind_rows(sapply(x, tidy)));
#$`(Intercept)`
#   estimate estimate1 estimate2 statistic p.value parameter  conf.low conf.high
#1 -4.497093  30.07725  34.57434 -104.8882       0  7155.965 -4.581141 -4.413045
#2        NA        NA        NA    0.7725       0        NA        NA        NA
#                              method alternative
#1            Welch Two Sample t-test   two.sided
#2 Two-sample Kolmogorov-Smirnov test   two-sided
#
#$wt
#   estimate estimate1 estimate2 statistic      p.value parameter  conf.low
#1 0.1825202 -3.097777 -3.280297  9.120137 1.074479e-19  4876.248 0.1432859
#2        NA        NA        NA  0.290750 0.000000e+00        NA        NA
#  conf.high                             method alternative
#1 0.2217544            Welch Two Sample t-test   two.sided
#2        NA Two-sample Kolmogorov-Smirnov test   two-sided
#
#There were 12 warnings (use warnings() to see them)

(The warnings originate from unequal factor levels when binding rows, and can be ignored.)

Perform hypothesis testing on coefficients from separate linear models

1 Answers1