I'm using the mice package on two different but related data frames. While the large majority of the variables are the same for both data frames, a small number of variables are unique to each data frame and the imputation happens for both data frames separately (they have slightly different imputation models/ predictor matrices, etc.).
In the end, I would like to combine the two resulting mids objects, but as the columns differ, the standard procedure via rbind(actually method rbind.mids is called) does not work.
Is there an easy way around this? Two alternative approaches I could think of:
- Combine the two dfs one time before imputation via
dplyr::bind_rows
and split them again. Now each data frame has all columns and rbind() would work after the imputations. However, that would also require defining the predictor matrix and method section again for both data frames to tell mice to ignore the new columns. - Use the
mice::complete(imp_df, "long", include = TRUE)
function on both mids objects, combine the resulting data frames, and usemice::as.mids()
to convert them back into a single mids object. But I'm not sure if that would work or mess something else up, e.g.
Here is some example data to illustrate the issue
library(mice)
data("nhanes2") # load test data from mice package
# make test_df 1
df_1 <- nhanes2[1:14,c(1,2,3)]
# make test_df 2
df_2 <- nhanes2[15:25, c(1,2,4)]
# quick and dirty imputation for test purpose
test_imp1 <- mice(df_1)
test_imp2 <- mice(df_2)
rbind(test_imp1, test_imp2)
Error in rbind.mids.mids(x, y, call = call) :
datasets have different variable names