2

Goal: Grouped by quarter and name I want to have the top n names by count (see example below). So the desired output for top 1 (for the example below) would be:

2019 Q1  Klaus 2
2019 Q2   Karl 3

As this is just a toy example going forward I also want to have the top 4, 5 etc by count per quarter and name. Do you have any good ideas how to implement this with data.table (no dplyr please). Many thanks!

library(data.table)

dt <- data.table(x = c("2019 Q1", "2019 Q1", "2019 Q1", "2019 Q2", "2019 Q2", "2019 Q2", "2019 Q2"),
                 y = c("Klaus", "Gustav", "Klaus", "Karl", "Karl", "Karl", "Stefan"))

# Structure of dt
# x      y
# 1: 2019 Q1  Klaus
# 2: 2019 Q1 Gustav
# 3: 2019 Q1  Klaus
# 4: 2019 Q2   Karl
# 5: 2019 Q2   Karl
# 6: 2019 Q2   Karl
# 7: 2019 Q2 Stefan


dt[, .N, by = .(x, y)]

# Output:
# x      y N
# 1: 2019 Q1  Klaus 2
# 2: 2019 Q1 Gustav 1
# 3: 2019 Q2   Karl 3
# 4: 2019 Q2 Stefan 1
Wimpel
  • 26,031
  • 1
  • 20
  • 37
rkraft
  • 495
  • 4
  • 16

4 Answers4

5

You can first calculate the N per name and quarter and then order the data.table and afterwards pick the first n rows per quarter:

dt[, .N, by = .(x, y)][order(-N), head(.SD, 1), by = x]
Gilean0709
  • 1,098
  • 6
  • 17
  • nice one, hadn't thought about `head()` not leading to `NA` when a top_n does't (fully) exist in every group... Will keep this in mind for my own work ;-) – Wimpel Feb 07 '20 at 09:15
1

Here is a base R solution using aggregate

> aggregate(y~x,dt,function(v) as.matrix(head(data.frame(sort(table(v),decreasing = TRUE)),1)))
        x   y.1 y.2
1 2019 Q1 Klaus   2
2 2019 Q2  Karl   3
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
1

here is another data.table approach, almost the same as Gilean's answer, but without head().

dt[, .N, by = .(x,y) ][ order(-N), .SD[1:1], by = x ]

#          x     y N
# 1: 2019 Q2  Karl 3
# 2: 2019 Q1 Klaus 2
Wimpel
  • 26,031
  • 1
  • 20
  • 37
0

Another solution with data.table:

dt[, .N, by = .(x, y)][, .SD[N==max(N)][1], x]

or better

dt[, .N, by = .(x, y)][, head(.SD[N==max(N)], 1), x]
jogo
  • 12,469
  • 11
  • 37
  • 42