5

There is this extra bit of complications on dplyr functionality that I haven't been able to solve. Mainly, I want to sort a second group within an already sorted group.

So I have this data.frame:

a_table <- data.frame(id=1:30, 
    grp1 = sample(LETTERS[1:5], 30, replace=TRUE, prob=c(1,1,2,2,3)), 
    grp2 = sample(letters[6:8], 30, replace=TRUE, prob=c(2,2,3))) 

I first group by grp1 count the entries and order them, then for each grp1 I count the values of each grp2 and order them.

My attempt to do this:

a_summary <- a_table %>% 
    group_by(grp1) %>% 
        mutate(frst_count = n()) %>% 
        arrange(desc(frst_count)) %>% 
    group_by(grp2) %>% 
        mutate(scnd_count = n()) %>% 
        arrange(desc(scnd_count))

But there's obviously something missing because there's no group summarise and therefore no group sorting. Other tries with summarise haven't distinguished the group 1 and 2.

Thanks.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Diego-MX
  • 2,279
  • 2
  • 20
  • 35

1 Answers1

11

By default, group_by has add = FALSE, which means rather than adding the second level of grouping, you are overwriting the first, leading to your error.

You could use:

library(dplyr)
a_table %>% group_by(grp1) %>%
            mutate(frst_count = n()) %>%
            group_by(grp2, add = TRUE) %>%
            mutate(scnd_count = n()) %>%
            arrange(frst_count, scnd_count)
jeremycg
  • 24,657
  • 5
  • 63
  • 74