Efficiently inserting default missing rows in a data.table

Question

Suppose I've got the following data.table :

dt <- data.table(id=c(1,1,1,1,1,1,2,2,2,2),
           wday=c("mon","tue","wed","thu","fri","sat","mon","tue","thu","fri"),
           val=c(2,3,5,8,6,2,3,4,2,6))

    id wday val
 1:  1  mon   2
 2:  1  tue   3
 3:  1  wed   5
 4:  1  thu   8
 5:  1  fri   6
 6:  1  sat   2
 7:  2  mon   3
 8:  2  tue   4
 9:  2  thu   2
10:  2  fri   6

This is the result of an aggregation of another data.table. It represents the count (val) of a variable depending on the week day (wday) for different individuals (id). The problem is, during my operations I've lost the week days where the count is 0.

So the question is : how could I update my data.table object efficiently by inserting, for each id, as many rows as there are missing week days with val=0 ?

The result would be the following :

    id wday val
 1:  1  mon   2
 2:  1  tue   3
 3:  1  wed   5
 4:  1  thu   8
 5:  1  fri   6
 6:  1  sat   2
 7:  1  sun   0
 8:  2  mon   3
 9:  2  tue   4
10:  2  wed   0
11:  2  thu   2
12:  2  fri   6
13:  2  sat   0
14:  2  sun   0

Thanks a lot for your help.

score 2 · Accepted Answer · edited May 13 '13 at 09:36

2

One straightforward way I could think of right now is to use expand.grid to get all combinations and then use that to subset with allow.cartesian = TRUE:

setkey(dt, "id", "wday")
vals <- c("mon", "tue", "wed", "thu", "fri", "sat", "sun")
idx <- expand.grid(vals, unique(dt$id))[, 2:1]
dt[J(idx), allow.cartesian=TRUE]

#     id wday val
#  1:  1  mon   2
#  2:  1  tue   3
#  3:  1  wed   5
#  4:  1  thu   8
#  5:  1  fri   6
#  6:  1  sat   2
#  7:  1  sun  NA
#  8:  2  mon   3
#  9:  2  tue   4
# 10:  2  wed  NA
# 11:  2  thu   2
# 12:  2  fri   6
# 13:  2  sat  NA
# 14:  2  sun  NA

Alternatively, it is possible to directly build the idx data table with CJ :

dt[CJ(unique(dt$id),vals), allow.cartesian=TRUE]

edited May 13 '13 at 09:36

juba

47,631
14
113
118

answered May 13 '13 at 09:27

Arun

116,683
26
284
387

1

Thanks a lot @Arun ! I just edited your answer slightly to mention the possibility to use `CJ` instead of `expand.grid`. Feel free to keep, delete or modify it. – juba May 13 '13 at 09:37
Oh yes indeed @juba, great! I hadn't used that in a while... thanks for reminding me as well. – Arun May 13 '13 at 09:52

score 1 · Answer 2 · answered May 13 '13 at 09:36

1

One other possilibity with match and ddply :

FUN <- function(x) {
y <- x$val[match(c("mon", "tue", "wed", "thu", "fri", "sat", "sun"), x$wday, nomatch=NA)]
y[is.na(y)] <- 0
y <- data.frame(wday=c("mon", "tue", "wed", "thu", "fri", "sat", "sun"), val=y)
y
}
ddply(dt, .(id), FUN)

answered May 13 '13 at 09:36

droopy

2,788
1
14
12

Thanks for the `plyr` solution. I think I'll accept the `data.table` one, but this one works perfectly well, too. – juba May 13 '13 at 09:46

Efficiently inserting default missing rows in a data.table

2 Answers2

Linked