Pooled average marginal effects from survey-weighted and multiple-imputed data

Question

I am working with survey data and their associated weights, in addition to missing data that I imputed using mice(). The model I'm eventually running contains complex interactions between variables for which I want the average marginal effect.

This task seems trivial in STATA, but I'd rather stay in R since that's what I know best. It seems easy to retrieve AME's for each separate imputed dataset and average the estimates. However, I need to make use of pool() (from mice) to make sure I'm getting the correct standard errors.

Here is a reproducible example:

library(tidyverse)
library(survey)
library(mice)
library(margins)

df <- tibble(y = c(0, 5, 0, 4, 0, 1, 2, 3, 1, 12), region = c(1, 1, 1, 1, 1, 3, 3, 3, 3, 3), 
             weight = c(7213, 2142, 1331, 4342, 9843, 1231, 1235, 2131, 7548, 2348), 
             x1 = c(1.14, 2.42, -0.34, 0.12, -0.9, -1.2, 0.67, 1.24, 0.25, -0.3),
             x2 = c(12, NA, 10, NA, NA, 12, 11, 8, 9, 9))

Using margins() on a simple (non-multiple) svyglm works without a hitch. Running svyglm on each imputation using which() and pooling the results also works well.

m <- with(surv_obj, svyglm(y ~ x1 * x2))
pool(m)

However, wrapping margins() into which() returns an error "Error in .svycheck(design) : argument "design" is missing, with no default"

with(surv_obj, margins(svyglm(y ~ x1 * x2), design = surv_obj))

If I specify the design in the svyglm call, I get "Error in UseMethod("svyglm", design) : no applicable method for 'svyglm' applied to an object of class "svyimputationList""

with(surv_obj, margins(svyglm(y ~ x1 * x2, design = surv_obj), design = surv_obj))

If I drop the survey layer, and simply try to run the margins on each imputed set and then pool, I get a warning: "Warning in get.dfcom(object, dfcom) : Infinite sample size assumed.".

m1 <- with(imputed_df, margins(lm(y ~ x1 * x2)))
pool(m1)

This worries me given that pool() may use sample size in its calculations.

Does anyone know of any method to either (a) use which(), margins() and pool() to retrieve the pooled average marginal effects or (b) knows what elements of margins() I should pass to pool() (or pool.scalar()) to achieve the desired result?

A few comments. (1) the warning about sample size probably just means that p and CI are computed using a normal distribution rather than a t. If your sample is big enough, that probably doesn't matter. (2) The `margins` package is no longer actively developed, so it probably won't support this kind of model anytime soon. (3) You may want to try the newer `marginaleffects` package, which works very similarly, but supports more models. (Disclaimer: I am the author.) Here is a vignette on multiple imputation: https://vincentarelbundock.github.io/marginaleffects/articles/multiple_imputation.html — Vincent, Jun 26 '22 at 03:42
Updated my post to reflect your comment, new package, and the issue being resolved. Let me know if you think my integration of survey() into your vignette code is unreasonable. And thank you for the wonderful work on marginaleffects()! — Gabriel Varela, Jun 27 '22 at 23:59
Looks good to me. Glad it helped! Another option is to answer your own question and to check your answer as accepted. That way, people reading unaccepted answers won't spend time on an already answered Q. Either way, nice job! — Vincent, Jun 28 '22 at 00:26

score 1 · Accepted Answer · answered Jun 28 '22 at 00:58

Update following Vincent's comment

Wanted to update this post following Vincent's comment and related package marginaleffects() which ended up fixing my issue. Hopefully, this will be helpful to others stuck on similar problems.

I implemented the code in the vignette linked in Vincent's comment, adding a few steps that allow for survey weighting and modeling. It's worth noting that svydesign() will drop any observations missing on clustering/weighting variables, so marginaleffects() can't predict values back unto the original "dat" data and will throw up an error. Pooling my actual data still throws up an "infinite sample size assumed", which (as noted) should be fine but I'm still looking into fixes.

library(tidyverse)
library(survey)
library(mice)
library(marginaleffects)

fit_reg <- function(dat) {
  
    svy <- svydesign(ids = ~ 1, cluster = ~ region, weight = ~weight, data = dat)
    mod <- svyglm(y ~ x1 + x2*factor(x3), design = svy)
    out <- marginaleffects(mod, newdata = dat)
    
    class(out) <- c("custom", class(out))
    return(out)
}

tidy.custom <- function(x, ...) {
    out <- marginaleffects:::tidy.marginaleffects(x, ...)
    out$term <- paste(out$term, out$contrast)
    return(out)
}

df <- tibble(y = c(0, 5, 0, 4, 0, 1, 2, 3, 1, 12), region = c(1, 1, 1, 1, 1, 3, 3, 3, 3, 3), 
             weight = c(7213, 2142, 1331, 4342, 9843, 1231, 1235, 2131, 7548, 2348), 
             x1 = c(1.14, 2.42, -0.34, 0.12, -0.9, -1.2, 0.67, 1.24, 0.25, -0.3),
             x2 = c(12, NA, 10, NA, NA, 12, 11, 8, 9, 9),
             x3 = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2))

imputed_df <- mice(df, m = 2, seed = 123)

dat_mice <- complete(imputed_df, "all")
mod_imputation <- lapply(dat_mice, fit_reg)
mod_imputation <- pool(mod_imputation)

summary(mod_imputation)

Pooled average marginal effects from survey-weighted and multiple-imputed data

1 Answers1

Update following Vincent's comment