8

I have a data.frame like this

x <- data.frame(Category=factor(c("One", "One", "Four", "Two","Two",
"Three", "Two", "Four","Three")),
City=factor(c("D","A","B","B","A","D","A","C","C")),
Frequency=c(10,1,5,2,14,8,20,3,5))

  Category City Frequency
1      One    D        10
2      One    A         1
3     Four    B         5
4      Two    B         2
5      Two    A        14
6    Three    D         8
7      Two    A        20
8     Four    C         3
9    Three    C         5

I want to make a pivot table with sum(Frequency) and used the ddply function like this:

ddply(x,.(Category,City),summarize,Total=sum(Frequency))
  Category City Total
1     Four    B     5
2     Four    C     3
3      One    A     1
4      One    D    10
5    Three    C     5
6    Three    D     8
7      Two    A    34
8      Two    B     2

But I need this results sorted by the total in each Category group. Something like this:

Category City Frequency
1      Two    A        34
2      Two    B         2
3    Three    D        14
4    Three    C         5
5      One    D        10
6      One    A         1
7     Four    B         5
8     Four    C         3

I have looked and tried sort, order, arrange, but nothing seems to do what I need. How can I do this in R?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Liliana Pacheco
  • 881
  • 10
  • 13

2 Answers2

5

This is a nice question and I can't think of a straight way of doing this rather than creating a total size index and then sorting by it. Here's a possible data.table approach which uses setorder function which will order your data by reference

library(data.table)
Res <- setDT(x)[, .(Total = sum(Frequency)), by = .(Category, City)]
setorder(Res[, size := sum(Total), by = Category], -size, -Total, Category)[]
#    Category City Total size
# 1:      Two    A    34   36
# 2:      Two    B     2   36
# 3:    Three    D     8   13
# 4:    Three    C     5   13
# 5:      One    D    10   11
# 6:      One    A     1   11
# 7:     Four    B     5    8
# 8:     Four    C     3    8

Or if you deep in the Hdleyverse, we can reach a similar result using the newer dplyr package (as suggested by @akrun)

library(dplyr)
x %>% 
  group_by(Category, City) %>% 
  summarise(Total = sum(Frequency)) %>% 
  mutate(size= sum(Total)) %>% 
  ungroup %>%
  arrange(-size, -Total, Category)
talat
  • 68,970
  • 21
  • 126
  • 157
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
5

Here is a base R version, where DF is the result of your ddply call:

with(DF, DF[order(-ave(Total, Category, FUN=sum), Category, -Total), ])

produces:

  Category City Total
7      Two    A    34
8      Two    B     2
6    Three    D     8
5    Three    C     5
4      One    D    10
3      One    A     1
1     Four    B     5
2     Four    C     3

The logic is basically the same as David's, calculate the sum of Total for each Category, use that number for all rows in each Category (we do this with ave(..., FUN=sum)), and then sort by that plus some tie breakers to make sure stuff comes out as expected.

BrodieG
  • 51,669
  • 9
  • 93
  • 146