Generating a stacked cumulative smooth frequency distribution plot

Question

I have data in which two types of occurrences are registered: type_a and type_b and their year of occurrence.

This is one way to generate an example of my data:

set.seed(1)
years <- 1991:2010
type_a_years <- 20
type_b_years <- 10
type_a <- round(runif(type_a_years, 0, 5))
type_b <- c(rep(0, type_a_years-type_b_years),round(runif(type_b_years, 5, 7)))

df <- data.frame(year = unlist(sapply(1:length(years), function(x) c(rep(years[x], type_a[x]),rep(years[x], type_b[x])))),
                 type = unlist(sapply(1:length(years), function(x) c(rep("type_a", type_a[x]),rep("type_b", type_b[x])))))

head(df)
  year   type
1 1991 type_a
2 1992 type_a
3 1992 type_a
4 1993 type_a
5 1993 type_a
6 1993 type_a

I'd like to generate a cumulative frequency distribution plot of type_b events stacked on top of type_a events by year and I'd like the distributions to appear as curves rather than bars.

I'm guessing it should be some manipulation of:

library(ggplot2)
ggplot(df, aes(year)) + stat_ecdf()

Where I'll get two curves and stacked by type where the area under each type will be filled with a different color. That is, the area between the type_a curve and the x-axis will be in one color and the area between the type_b curve and the type_a curve will be in another color.

score 3 · Answer 1 · answered Dec 03 '15 at 18:30

3

You can do it using some aggregating in dplyr and tidyr, then using geom_area.

library(tidyr)
library(dplyr)
df1 <- df %>% group_by(type, year) %>%
              summarise(total = n()) %>%
              mutate(total = cumsum(total)) %>%
              ungroup %>%
              complete(type, year, fill = list(total = 0))

Now plot:

library(ggplot2)
ggplot(df1, aes(x = year, y = total, fill = type)) + geom_area()

answered Dec 03 '15 at 18:30

jeremycg

24,657
5
63
74

Thanks jeremycg. I'm getting this error though: Error in complete(`df %>% group_by(type, year) %>% summarise(total = n()) %>% mutate(total = cumsum(total)) %>% ungroup`, : unused argument (fill = list(total = 0)) – dan Dec 03 '15 at 18:56
you might have an outdated version of `tidyr`, complete was only added in v0.30. Try reinstalling it. – jeremycg Dec 03 '15 at 19:00

Generating a stacked cumulative smooth frequency distribution plot

1 Answers1