`data.table` how to get `keyby` to include all combinations of factors?

Question

I have a data.table and I would like to count the occurrence of each combination of a and b:

dt1 <- data.table(
  a = c(1,1,1,1,2,2,2,2,3,3,3,3),
  b = c(1,1,2,2,1,1,1,1,1,2,2,2) %>% letters[.]
)
#    a b
# 1: 1 a
# 2: 1 a
# 3: 1 b
# 4: 1 b
# 5: 2 a
# 6: 2 a
# 7: 2 a
# 8: 2 a
# 9: 3 a
# 10: 3 b
# 11: 3 b
# 12: 3 b
dt1[, .N, keyby = .(a, b)]
#    a b N
# 1: 1 a 2
# 2: 1 b 2
# 3: 2 a 4
# 4: 3 a 1
# 5: 3 b 3

It misses out the case of a==2 & b=="b", which has a zero count in dt1, but I want it to be included so the result would look like:

#    a b c
# 1: 1 a 2
# 2: 1 b 2
# 3: 2 a 4
# 4: 2 b 0
# 5: 3 a 1
# 6: 3 b 3

The most intuitive way to use the loop or the apply family but it is just inefficient for my large datasets. Any idea?

`c(1,1,2,2,1,1,1,1,1,2,2,2) %>% letters[.]` +1 for this great idea! — drmariod, Aug 15 '18 at 06:29

score 1 · Accepted Answer · answered Aug 15 '18 at 06:34

1

That's a tidyr/dplyr approach:

dt1 %>% 
  group_by(a,b) %>% 
  summarise(c = length(.)) %>% 
  ungroup %>%
  complete(a,b, fill = list(c = 0))

answered Aug 15 '18 at 06:34

Aleksandr

1,814
11
19

Could you please explain about `fill` is it `summarise`'s keyword? how it is working actually, could you please elaborate. – RavinderSingh13 Aug 15 '18 at 06:59
1

Complete is a wrapper of dplyr::left_join but it is tidyr function. You may pass fill value and replace NA's with it. In this case NA's replaced by 0. – Aleksandr Aug 15 '18 at 08:09
Thanks for reply. Sure, so NAs will be replaced with `0` what does `list(c=0)` do? – RavinderSingh13 Aug 15 '18 at 08:13
1

There could be more columns you may want to replace NA's so you put here those columns. – Aleksandr Aug 15 '18 at 08:17

`data.table` how to get `keyby` to include all combinations of factors?

1 Answers1