4

I am trying to produce something similar to densityplot() from the lattice package, using ggplot2 after using multiple imputation with the mice package. Here is a reproducible example:

require(mice)
dt <- nhanes
impute <- mice(dt, seed = 23109)
x11()
densityplot(impute)

Which produces:

densityplot output that I want to improve in ggplot

I would like to have some more control over the output (and I am also using this as a learning exercise for ggplot). So, for the bmi variable, I tried this:

bar <- NULL
for (i in 1:impute$m) {
    foo <- complete(impute,i)
    foo$imp <- rep(i,nrow(foo))
    foo$col <- rep("#000000",nrow(foo))
    bar <- rbind(bar,foo)
}

imp <-rep(0,nrow(impute$data))
col <- rep("#D55E00", nrow(impute$data))
bar <- rbind(bar,cbind(impute$data,imp,col))
bar$imp <- as.factor(bar$imp)

x11()
ggplot(bar, aes(x=bmi, group=imp, colour=col)) + geom_density()
+ scale_fill_manual(labels=c("Observed", "Imputed"))

which produces this: enter image description here

So there are several problems with it:

  1. The colours are wrong. It seems my attempt to control the colours is completely wrong/ignored
  2. There are unwanted horizontal and vertical lines
  3. I would like the legend to show Imputed and Observed but my code gives the error invalid argument to unary operator

Moreover, it seems like quite a lot of work to do what is accomplished in one line with densityplot(impute) - so I wondered if I might be going about this in the wrong way entirely ?

Edit: I should add the fourth problem, as noted by @ROLO:

.4. The range of the plots seems to be incorrect.

Joe King
  • 2,955
  • 7
  • 29
  • 43
  • See also this post on the topic of comparing densities in r across groups: http://eeyore.ucdavis.edu/stat141/MultipleDensityPlots.html – Jeromy Anglim Sep 12 '14 at 05:07

2 Answers2

6

The reason it is more complicated using ggplot2 is that you are using densityplot from the mice package (mice::densityplot.mids to be precise - check out its code), not from lattice itself. This function has all the functionality for plotting mids result classes from mice built in. If you would try the same using lattice::densityplot, you would find it to be at least as much work as using ggplot2.

But without further ado, here is how to do it with ggplot2:

require(reshape2)
# Obtain the imputed data, together with the original data
imp <- complete(impute,"long", include=TRUE)
# Melt into long format
imp <- melt(imp, c(".imp",".id","age"))
# Add a variable for the plot legend
imp$Imputed<-ifelse(imp$".imp"==0,"Observed","Imputed")

# Plot. Be sure to use stat_density instead of geom_density in order
#  to prevent what you call "unwanted horizontal and vertical lines"
ggplot(imp, aes(x=value, group=.imp, colour=Imputed)) + 
    stat_density(geom = "path",position = "identity") +
    facet_wrap(~variable, ncol=2, scales="free")

enter image description here

But as you can see the ranges of these plots are smaller than those from densityplot. This behaviour should be controlled by parameter trim of stat_density, but this seems not to work. After fixing the code of stat_density I got the following plot:

enter image description here

Still not exactly the same as the densityplot original, but much closer.

Edit: for a true fix we'll need to wait for the next major version of ggplot2, see github.

ROLO
  • 4,183
  • 25
  • 41
  • Great. Thank you (+1). May I ask how did you fix the code to extend the range ? – Joe King Aug 21 '12 at 17:04
  • Sure. See [here](http://pastebin.com/cr329eru). Just a quick hack. I'm still figuring out package development, and when that works out and I have time I'll look for a better solution and submit it to the `ggplot2` devs. – ROLO Aug 21 '12 at 17:26
  • Thanks again. That works great. I'm sorry to ask another question, but I am a beginner with R - I understand the changes you made to the code, however can you tell me how do you find the code that you had to change ? – Joe King Aug 21 '12 at 20:47
  • I looked it up on [github](https://github.com/hadley/ggplot2/blob/master/R/stat-density.r). – ROLO Aug 21 '12 at 21:05
5

You can ask Hadley to add a fortify method for this mids class. E.g.

fortify.mids <- function(x){
 imps <- do.call(rbind, lapply(seq_len(x$m), function(i){
   data.frame(complete(x, i), Imputation = i, Imputed = "Imputed")
 }))
 orig <- cbind(x$data, Imputation = NA, Imputed = "Observed")
 rbind(imps, orig)
}

ggplot 'fortifies' non-data.frame objects prior to plotting

ggplot(fortify.mids(impute), aes(x = bmi, colour = Imputed, 
   group = Imputation)) +
geom_density() + 
scale_colour_manual(values = c(Imputed = "#000000", Observed = "#D55E00"))

note that each ends with a '+'. Otherwise the command is expected to be complete. This is why the legend did not change. And the line starting with a '+' resulted in the error.

enter image description here

You can melt the result of fortify.mids to plot all variables in one graph

library(reshape)
Molten <- melt(fortify.mids(impute), id.vars = c("Imputation", "Imputed"))
ggplot(Molten, aes(x = value, colour = Imputed, group = Imputation)) + 
geom_density() + 
scale_colour_manual(values = c(Imputed = "#000000", Observed = "#D55E00")) +
facet_wrap(~variable, scales = "free")

enter image description here

Thierry
  • 18,049
  • 5
  • 48
  • 66
  • This is very nice. Thank you, (+1). May I ask, is it possible to prevent the horizontal and vertical lines from plotting ?: – Joe King Aug 21 '12 at 17:03
  • geom_density() creates polygons, not lines. Hence the vertical and horzontal lines. Polygons have the benefit that you can use a colour for the outline and a fill colour. See http://had.co.nz/ggplot2/stat_density.html – Thierry Aug 22 '12 at 09:17