Plot density and cumulative density function in one combined plot using ggplot2

Question

I would like to get a plot that combines the density of observations and the cdf.

The usual problem with that is that the scales of the two are way off. How can this be remedied, i.e., two scales be used or, alternatively, one of the data series be rescaled (preferably within ggplot, as I would like to separate computation and display of data).

Here's the code so far:

>dput(tmp) yields

structure(list(drivenkm = c(8, 11, 21, 4, 594, 179, 19, 7, 10, 36)), .Names = "drivenkm", class = c("data.table", "data.frame" ), row.names = c(NA, -10L), .internal.selfref = <pointer: 0x223cb78>)

then I do

p = ggplot(data = tmp, aes(x = drivenkm)) + geom_histogram(aes(y = ..density..), alpha = 0.2, binwidth = 3) + stat_ecdf(aes(x = drivenkm)); print(p)

What I get is the following:

enter image description here

Obviously, the scales are way off. How can this be fixed, such that both the histogram and the cdf can be interpreted in a sensible way?

Thanks!

score 5 · Accepted Answer · answered Jan 14 '14 at 09:54

5

The density is scaled by the binwidth so the area sums to 1. So the y for your histogram should be multiplied by this too:

p = ggplot(data = tmp, aes(x = drivenkm)) +
   geom_histogram(aes(y = 3*..density..), alpha = 0.2, binwidth = 3) +
   stat_ecdf(aes(x = drivenkm))

enter image description here

answered Jan 14 '14 at 09:54

James

65,548
14
155
193

Thanks for the pointer with the multiplication. I had taken `..density..` from some SO snippet, but never understood what this syntax really meant and was thus afraid to touch it. – Peter Lustig Jan 14 '14 at 10:14

Plot density and cumulative density function in one combined plot using ggplot2

1 Answers1

Linked