0

I'm trying to produce a cumulative function using ggplot and stat_ecdf. Since I need the area under the curve to be colored, I'm using geom=="ribbon".

I need the x axis to be as highest as 20, however, I want to estimate the cumulative function until 100 (which is my highest variable value). For this, I'm using coord_cartesian.

This is the code I'm using:

Graph <- 
    ggplot(df, aes(x=var_x,fill=year)) +
    stat_ecdf(geom = "ribbon",alpha=0.5) +
    coord_cartesian(xlim = c(0, 20))+
    scale_y_continuous(breaks = seq(0, 1, 0.1),
                       labels = scales::percent, 
                       limits = c(0, 1),
                       expand = c(0,0))

However, I get the following error:

Error: geom_ribbon requires the following missing aesthetics: ymin and ymax or xmin and xmax

Does anyone know how to fix this? Any help is appreciated !!

  • Use `geom_area`, not `geom_ribbon`. Use `geom_ribbon` when you have data for the width you want shaded (like for a confidence interval). Use `geom_area` when you want to shade from the axis up to the line. – Gregor Thomas Mar 19 '22 at 01:50
  • It doesn't work. When I changed it for `geom_area`, the graph doesn't show any distribution. – Adriana Castillo Castillo Mar 19 '22 at 02:33
  • 2
    Please share some sample data so we can attempt to debug your code. `dput()` works well for sharing copy/pasteable sample data, e.g., `dput(df[1:20, ])` for the first 20 rows. Choose a suitable small sample. – Gregor Thomas Mar 19 '22 at 04:41
  • I think what Gregor meant, was to use geom = "area" like `stat_ecdf(geom = "area",alpha=0.5) ` --- hve you tried this ? – tjebo Mar 19 '22 at 13:14
  • The data that I'm using is `df<- data.frame(replicate(1,sample(0:500,1000,rep=TRUE)))`. My code is: `Graph_t <- ggplot(df, aes(x=df$replicate.1..sample.0.500..1000..rep...TRUE..)) + stat_ecdf(geom = "area",alpha=0.5) + coord_cartesian(xlim = c(0, 20))` (which doesn't show any distribution. The graph shows up empty). Many thanks to both! – Adriana Castillo Castillo Mar 19 '22 at 16:05

1 Answers1

0

I found a "manual" solution. First, I created a variable equal to the cumulative distribution of my variable of interest:

df <- 
  df %>%
  dplyr::mutate(cumula_var = cume_dist(var_x))

Then, I made the graph:

Graph <- 
  ggplot(df, aes(x=var_x, y=cumula_var)) +
  geom_line() +
  geom_ribbon(aes(ymin = 0, ymax = ..y..,
                  xmin = 0, xmax = 20))+
  coord_cartesian(xlim = c(0, 20))