1

Imagine I have two separate lm objects

data(mtcars)

lm1 <- lm(mpg ~ wt, data = mtcars)
lm2 <- lm(mpg ~ wt + disp, data = mtcars)

In this case I'd like to compare both wt coefficients, and perform a hypothesis test on the null that the coefficients in both models are equal(for technical reason I need to actually have two models, rather than just including an interaction)

tomw
  • 3,114
  • 4
  • 29
  • 51
  • Your question seems to be incomplete. Do you mean something like a likelihood ratio test? E.g. `anova(lm1, lm2, test = "LRT")`? – Maurits Evers Mar 09 '18 at 04:12
  • No, I want to do a direct test on the coefficient itself, as the package `multcomp` lets you do for coefficients within a single model – tomw Mar 09 '18 at 04:13
  • So why not use the full model and then use `multcomp` to sequentially test for the significance of all predictor variables? Isn't that exactly the purpose of `multicomp`? – Maurits Evers Mar 09 '18 at 04:17
  • PS. Your question is still incomplete: *"and perform a hypothesis test on the null that the coefficients in both models (...) ..."*. – Maurits Evers Mar 09 '18 at 04:18

1 Answers1

2

Since you want to perform a hypothesis test on the estimates, I suggest a fully Bayesian model, which will get you the full posterior distribution of every variable.

rstanarm is based on Stan, and offers convenient functions that mimic the usual lm, glm syntax; if you want to know more about Stan/RStan, see here.

Based on the posterior distributions of every variable, we can then perform e.g. a t test and Kolmogorov-Smirnov test to compare the full posterior densities for every variable.

Here is what you could do:

  1. Perform the model fits.

    library(rstanarm);
    lm1 <- stan_lm(mpg ~ wt, data = mtcars, prior = NULL);
    lm2 <- stan_lm(mpg ~ wt + disp, data = mtcars, prior = NULL);
    

    Note how easy it is to run a fully Bayesian linear model with rstanarm.

  2. Extract the posterior densities for all shared coefficients (in this case, the (Intercept) and wt).

    library(tidyverse);
    shared.coef <- intersect(names(coef(lm1)), names(coef(lm2)));
    shared.coef;
    #[1] "(Intercept)" "wt"
    df1 <- lm1 %>%
        as.data.frame() %>%
        select(one_of(shared.coef)) %>%
        mutate(model = "lm1");
    df2 <- lm2 %>%
        as.data.frame() %>%
        select(one_of(shared.coef)) %>%
        mutate(model = "lm2");
    

    The posterior densities for 4000 MCMC draws are stored in two data.frames.

  3. We plot the posterior densities.

    # Plot posterior densities for all common parameters
    bind_rows(df1, df2) %>%
        gather(var, value, 1:length(shared.coef)) %>%
        ggplot(aes(value, colour = model)) +
            geom_density() +
            facet_wrap(~ var, scale = "free");
    

    enter image description here

  4. We compare the posterior density distributions of every shared parameter in a t test and a KS test. Here I'm using library broom to tidy-up the output.

    # Perform t test and KS test
    library(broom);
    res <- lapply(1:length(shared.coef), function(i)
        list(t.test(df1[, i], df2[, i]), ks.test(df1[, i], df2[, i])));
    names(res) <- shared.coef;
    lapply(res, function(x) bind_rows(sapply(x, tidy)));
    #$`(Intercept)`
    #   estimate estimate1 estimate2 statistic p.value parameter  conf.low conf.high
    #1 -4.497093  30.07725  34.57434 -104.8882       0  7155.965 -4.581141 -4.413045
    #2        NA        NA        NA    0.7725       0        NA        NA        NA
    #                              method alternative
    #1            Welch Two Sample t-test   two.sided
    #2 Two-sample Kolmogorov-Smirnov test   two-sided
    #
    #$wt
    #   estimate estimate1 estimate2 statistic      p.value parameter  conf.low
    #1 0.1825202 -3.097777 -3.280297  9.120137 1.074479e-19  4876.248 0.1432859
    #2        NA        NA        NA  0.290750 0.000000e+00        NA        NA
    #  conf.high                             method alternative
    #1 0.2217544            Welch Two Sample t-test   two.sided
    #2        NA Two-sample Kolmogorov-Smirnov test   two-sided
    #
    #There were 12 warnings (use warnings() to see them)
    

    (The warnings originate from unequal factor levels when binding rows, and can be ignored.)

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68