4

I´m using the mice package in R to do multiple imputation. I´ve done several imputations with only numerical variables, the imputation method is predictive mean matching, and when I use stripplot(imp) I get to see the observed and imputed values of all the variables.

The problem occurs when I impute a combination of categorical and numerical variables. The imputation method then is PMM for the numerical variables, and logistical regression for the categorical ones. Then, stripplot only shows me the numerical variables. Using the code below I tried to force edu, a categorical variable with 2 values, to be plotted:

stripplot(imp, imp$edu)
stripplot(imp, names(imp$edu))

And I got this error:

Error in stripplot.mids(imp, imp$edu) : Cannot pad extended formula.

Does anyone know how I can plot the values of the observed and the imputed values for both the numerical and the categorical variables?

slamballais
  • 3,161
  • 3
  • 18
  • 29

2 Answers2

4

One thing you can try is to retrieve the imputed dataset as a data.frame and just use normal plotting functions. First retrieve the datasets including the original dataset with missing values (imp is the mice.mids object i.e. result of running mice)

impL <- complete(imp,"long",include = T)

Next add a dummy indicating which datasets are imputed

impL$Imputed <- factor(impL$.imp >0,labels = c("Observed","Imputed"))

Then you can just use plotting functions for each variable. This has the benefit that you can create nicer plots. For example using ggplot (package ggplot2) to create a barplot on a categorical variable:

ggplot(impL[which(!is.na(impL$var1)),],aes(x = var1)) + 
geom_bar(aes(y = ..prop.., group = Imputed)) + facet_wrap(Imputed ~ .,ncol=1,nrow=2)

The !is.na is included to avoid the plotting of an NA bar. var1 is the variable you want to plot. For a continuous variable you might create a density plot.

ggplot(impL, aes(x = var2, colour = Imputed)) + geom_density()

To look at all the unique imputations you can add group = .imp within the aes brackets. Hope this helps

Niek
  • 1,594
  • 10
  • 20
  • 1
    Thanks, @Niek. However, the facetplot code gives an error: `"unexpected ',' in: ggplot(impL[which(!is.na(long_df$var1)),],aes(x = var1)) + geom_bar(aes(y = ..prop.., group = Imputed)) + facet_wrap(Imputed ~,"` ... – ayePete Nov 19 '19 at 08:31
  • Thanks for the comment, I forgot to add a `.` to the code. With `+ facet_wrap(Imputed ~ .,ncol=1,nrow=2)` this should work – Niek Nov 19 '19 at 08:54
3

I just had a similar issue, so I figured I might post an answer that achieves your goal without having to extract the imputed data.

library(mice)

# Create dataset holding numerical and categorical data
a <- as.factor(rbinom(100, 1, 0.5))
b <- rnorm(100, 5, 1)
df <- cbind.data.frame(a, b)

# Randomly assign 10 NA values to each column
df$a[sample(length(df$a), 10)] <- NA
df$b[sample(length(df$b), 10)] <- NA

# Impute with ppm and logreg
init = mice(df, maxit=0)
meth = init$method
meth['a'] <- 'logreg'
imp <- mice(df, method = meth)

# This only plots b, the numerical
stripplot(imp)

# This plots both, as included below
stripplot(imp, a + b ~ .imp)

enter image description here

humperderp
  • 241
  • 1
  • 11