3

I am working with survey data that use probability weights and multiple imputations. I would like to get marginal effects after estimating a logit model using the imputed data sets and the survey weights. I cannot figure out how to do this in R. Stata has the package mimrgns which makes it pretty easy. There is also this article (pdf) and supplementary material (pdf) that gives some direction, but I can't seem to apply it to my situation.

In the following example, please assume I already imputed "income" across three data sets (i.e., df1, df2, and df3). I would like to predict "gender" using employment status (i.e., working) and "income."

Here is a reproducible example.

library(tibble)
library(survey)
library(mitools)
library(ggeffects)

# Data set 1
# Note that I am excluding the "income" variable from the "df"s and creating  
# it separately so that it varies between the data sets. This simulates the 
# variation with multiple imputation. Since I am using the same seed
# (i.e., 123), all the other variables will be the same, the only one that 
# will vary will be "income."

set.seed(123)

df1 <- tibble(id      = seq(1, 100, by = 1),
              gender  = as.factor(rbinom(n = 100, size = 1, prob = 0.50)),
              working = as.factor(rbinom(n = 100, size = 1, prob = 0.40)),
              pweight = sample(50:500, 100,  replace   = TRUE))


# Create random income variable.

set.seed(456)

income <- tibble(income = sample(0:100000, 100))

# Bind it to df1

df1 <- cbind(df1, income)


# Data set 2

set.seed(123)

df2 <- tibble(id      = seq(1, 100, by = 1),
              gender  = as.factor(rbinom(n = 100, size = 1, prob = 0.50)),
              working = as.factor(rbinom(n = 100, size = 1, prob = 0.40)),
              pweight = sample(50:500, 100,  replace   = TRUE))

set.seed(789)

income <- tibble(income = sample(0:100000, 100))

df2 <- cbind(df2, income)


# Data set 3

set.seed(123)

df3 <- tibble(id      = seq(1, 100, by = 1),
              gender  = as.factor(rbinom(n = 100, size = 1, prob = 0.50)),
              working = as.factor(rbinom(n = 100, size = 1, prob = 0.40)),
              pweight = sample(50:500, 100,  replace   = TRUE))

set.seed(101)

income <- tibble(income = sample(0:100000, 100))

df3 <- cbind(df3, income)


# Apply weights via svydesign

imputation <- svydesign(id      = ~id,
                        weights = ~pweight,
                        data    = imputationList(list(df1, 
                                                      df2, 
                                                      df3)))


# Logit model with weights and imputations

logitImp <- with(imputation, svyglm(gender ~ working + income,
                             family = binomial()))


# Combine results across MI datasets

summary(MIcombine(logitImp))

Normally I would use library(ggeffects) to get marginal effects, but I get the following error when I try with the imputed data Error in class(model) <- "lmerMod" : attempt to set an attribute on NULL. Here is an example of how I would do it without the imputation, using "df1" as the data set.

# Create new svydesign variable

noImp <- svydesign(id      = ~id,
                   weights = ~pweight, 
                   data    = df1)


# Run model

logit <- svyglm(gender ~ working + income,
                family = binomial,
                design = noImp,
                data   = df1)


# Get marginal effects at the mean

ggpredict(logit, term = "working")

Any idea how to do this with with multiple imputation?

scottsmith
  • 371
  • 2
  • 11
  • do you want `?survey::marginpred` not `ggpredict` ? – Anthony Damico Feb 01 '18 at 09:02
  • Thanks, @AnthonyDamico. When I try using the form of `marginpred(model, adjustfor, predictat, ...)`, I get the following error `Error in UseMethod("marginpred", model) : no applicable method for 'marginpred' applied to an object of class "list"`. It seems this is an active area of research and there isn't firm guidance on how to get marginal predictions with multiple imputation. Some journal articles will take one of the imputed data sets and predict with that, noting that it is more for illustrative purposes and shouldn't be treated as exact. – scottsmith Feb 01 '18 at 20:55
  • hi, i'm almost certain `ggpredict` does not account for the complex sampling design. maybe edit your example to show a successful use of `marginpred` on a single implicate and make clear what triggers the error when extending to a second implicate. thanks – Anthony Damico Feb 01 '18 at 21:56
  • _ggeffects_ actually works for survey-models as well, but cannot deal with pooled results from multiple models on imputed data. I'll check if I can implement such feature. – Daniel Feb 21 '18 at 09:08
  • @Daniel any luck in implementing that feature? Thanks! – Juan C Jul 03 '21 at 16:34

0 Answers0