R sort summarise ddply by group sum

Question

I have a data.frame like this

x <- data.frame(Category=factor(c("One", "One", "Four", "Two","Two",
"Three", "Two", "Four","Three")),
City=factor(c("D","A","B","B","A","D","A","C","C")),
Frequency=c(10,1,5,2,14,8,20,3,5))

  Category City Frequency
1      One    D        10
2      One    A         1
3     Four    B         5
4      Two    B         2
5      Two    A        14
6    Three    D         8
7      Two    A        20
8     Four    C         3
9    Three    C         5

I want to make a pivot table with sum(Frequency) and used the ddply function like this:

ddply(x,.(Category,City),summarize,Total=sum(Frequency))
  Category City Total
1     Four    B     5
2     Four    C     3
3      One    A     1
4      One    D    10
5    Three    C     5
6    Three    D     8
7      Two    A    34
8      Two    B     2

But I need this results sorted by the total in each Category group. Something like this:

Category City Frequency
1      Two    A        34
2      Two    B         2
3    Three    D        14
4    Three    C         5
5      One    D        10
6      One    A         1
7     Four    B         5
8     Four    C         3

I have looked and tried sort, order, arrange, but nothing seems to do what I need. How can I do this in R?

score 5 · Answer 1 · edited Apr 08 '15 at 18:41

This is a nice question and I can't think of a straight way of doing this rather than creating a total size index and then sorting by it. Here's a possible data.table approach which uses setorder function which will order your data by reference

library(data.table)
Res <- setDT(x)[, .(Total = sum(Frequency)), by = .(Category, City)]
setorder(Res[, size := sum(Total), by = Category], -size, -Total, Category)[]
#    Category City Total size
# 1:      Two    A    34   36
# 2:      Two    B     2   36
# 3:    Three    D     8   13
# 4:    Three    C     5   13
# 5:      One    D    10   11
# 6:      One    A     1   11
# 7:     Four    B     5    8
# 8:     Four    C     3    8

Or if you deep in the Hdleyverse, we can reach a similar result using the newer dplyr package (as suggested by @akrun)

library(dplyr)
x %>% 
  group_by(Category, City) %>% 
  summarise(Total = sum(Frequency)) %>% 
  mutate(size= sum(Total)) %>% 
  ungroup %>%
  arrange(-size, -Total, Category)

score 5 · Accepted Answer · answered Apr 08 '15 at 18:45

Here is a base R version, where DF is the result of your ddply call:

with(DF, DF[order(-ave(Total, Category, FUN=sum), Category, -Total), ])

produces:

  Category City Total
7      Two    A    34
8      Two    B     2
6    Three    D     8
5    Three    C     5
4      One    D    10
3      One    A     1
1     Four    B     5
2     Four    C     3

The logic is basically the same as David's, calculate the sum of Total for each Category, use that number for all rows in each Category (we do this with ave(..., FUN=sum)), and then sort by that plus some tie breakers to make sure stuff comes out as expected.

R sort summarise ddply by group sum

2 Answers2