0

I would like to construct cumulative count for two groups and reweight it to level 1.

I know how to plot density in this case:

my_df = data.frame(col_1 = sample(c(0,1), 1000, replace = TRUE),
                   col_2 = sample(seq(1,100,by=1), 1000, replace = TRUE))
my_df$col_1 <- as.factor(my_df$col_1)
ggplot(data = my_df, aes(x = col_2, group = col_1, col = col_1))+
  geom_density(position = 'dodge', size = 2)

Here is the plot: pdf

However, if I try to plot cumsum I am getting the following picture:cdf

As you can see second line starts where first line ends level.

Is there fix for it? I can always try to do computations manually and plot it, but I wonder if there is ggplot solution? There are some solutions that I found on SO, but they do not involve scaling data to level 1.

ekad
  • 14,436
  • 26
  • 44
  • 46
user1700890
  • 7,144
  • 18
  • 87
  • 183
  • 1
    is this what you want? `stat_bin(aes(y=cumsum(..count..)),geom="line")` or try the `stat_ecdf()` – Roman Oct 13 '17 at 09:57
  • @Jimbou I tried: `ggplot(data = my_df, aes(x = col_2,y = cumsum(..count..), group = col_1, col = col_1))+ stat_bin(aes(y=cumsum(..count..)),geom="line")`. It did not work: starts at 500. I also tried: `ggplot(data = my_df, aes(x = col_2,y = cumsum(..count..), group = col_1, col = col_1))+ stat_ecdf(aes(y=cumsum(..count..)),geom="line")` also did not work. It returned: `Error in FUN(X[[i]], ...) : object 'count' not found` – user1700890 Oct 13 '17 at 15:02
  • 2
    try `ggplot(data = my_df, aes(x = col_2, group = col_1, col = col_1)) +stat_ecdf()` – Roman Oct 13 '17 at 15:05
  • It worked! Thank you! – user1700890 Oct 13 '17 at 16:42

1 Answers1

1

By right, this has already been answered by @Roman in the comments but just to make it clear and expand on that answer a little.

Create the data:

my_df = data.frame(col_1 = sample(c(0,1), 1000, replace = TRUE),
                   col_2 = sample(seq(1,100,by=1), 1000, replace = TRUE))
my_df$col_1 <- as.factor(my_df$col_1)

We can get cumulative counts using the stat_ecdf() function and to split this by group, simply use the aes variable group. Putting this together we get:

ggplot(data = my_df, aes(x = col_2, 
                         group = col_1, 
                         col = col_1)
       ) +stat_ecdf()

Using stat_ecdf and group

You can also change from the line to something akin to a distribution (filled area) by using geom="ribbon" and referencing the y cumulative value with:

ggplot(data = my_df, aes(x = col_2, 
                         group = col_1, 
                         fill = col_1)
       ) +stat_ecdf(aes(ymin=0, ymax=after_stat(y)),
                    geom="ribbon",alpha=0.2)

Using stat_ecdf and group filled

More on this in another SO thread

A_Murphy
  • 184
  • 2
  • 14