2

I have a data set with a continuous variable and a factor with n levels.

I'd like to plot an empirical cumulative distribution function for each level separately plus an overall ecdf in each of the panels. The point is to compare the subsets' (levels') ecdf with the overall (full data set).

Plotting each of the ecdfs in one panel is easy (using "diamonds" data set as an example):

ggplot(diamonds) + 
  stat_ecdf(aes(x=carat, colour = color)) + 
  stat_ecdf(aes(x=carat), lwd=1, linetype="dotted")

But when I try using the faceting option to separte the levels into panels

ggplot(diamonds) + 
  stat_ecdf(aes(x=carat, colour = color)) + 
  stat_ecdf(aes(x=carat), lwd=1, linetype="dotted") + 
  facet_wrap(~color, ncol=4)

Instead of n panels with a subset's ecdf and an overall ecdf (dotted) I get each subset's ecdf plotted twice.

I'm sure I'm missing something obvious. Feel free to point me to relevant question if I'm duplicating someone else's.

kategorically
  • 157
  • 2
  • 11

1 Answers1

1

Another hacky solution, remove the color variable from the diamonds dataset when plotting the overall ecdf:

ggplot(diamonds) + 
  stat_ecdf(aes(x=carat, colour = color)) + 
  stat_ecdf(data=diamonds[, names(diamonds) != "color"], aes(x=carat), lwd=1, linetype="dotted") + 
  facet_wrap(~color, ncol=4)
rrs
  • 9,615
  • 4
  • 28
  • 38