4

Suppose I've got the following data.table :

dt <- data.table(id=c(1,1,1,1,1,1,2,2,2,2),
           wday=c("mon","tue","wed","thu","fri","sat","mon","tue","thu","fri"),
           val=c(2,3,5,8,6,2,3,4,2,6))

    id wday val
 1:  1  mon   2
 2:  1  tue   3
 3:  1  wed   5
 4:  1  thu   8
 5:  1  fri   6
 6:  1  sat   2
 7:  2  mon   3
 8:  2  tue   4
 9:  2  thu   2
10:  2  fri   6

This is the result of an aggregation of another data.table. It represents the count (val) of a variable depending on the week day (wday) for different individuals (id). The problem is, during my operations I've lost the week days where the count is 0.

So the question is : how could I update my data.table object efficiently by inserting, for each id, as many rows as there are missing week days with val=0 ?

The result would be the following :

    id wday val
 1:  1  mon   2
 2:  1  tue   3
 3:  1  wed   5
 4:  1  thu   8
 5:  1  fri   6
 6:  1  sat   2
 7:  1  sun   0
 8:  2  mon   3
 9:  2  tue   4
10:  2  wed   0
11:  2  thu   2
12:  2  fri   6
13:  2  sat   0
14:  2  sun   0

Thanks a lot for your help.

juba
  • 47,631
  • 14
  • 113
  • 118

2 Answers2

2

One straightforward way I could think of right now is to use expand.grid to get all combinations and then use that to subset with allow.cartesian = TRUE:

setkey(dt, "id", "wday")
vals <- c("mon", "tue", "wed", "thu", "fri", "sat", "sun")
idx <- expand.grid(vals, unique(dt$id))[, 2:1]
dt[J(idx), allow.cartesian=TRUE]

#     id wday val
#  1:  1  mon   2
#  2:  1  tue   3
#  3:  1  wed   5
#  4:  1  thu   8
#  5:  1  fri   6
#  6:  1  sat   2
#  7:  1  sun  NA
#  8:  2  mon   3
#  9:  2  tue   4
# 10:  2  wed  NA
# 11:  2  thu   2
# 12:  2  fri   6
# 13:  2  sat  NA
# 14:  2  sun  NA

Alternatively, it is possible to directly build the idx data table with CJ :

dt[CJ(unique(dt$id),vals), allow.cartesian=TRUE]
juba
  • 47,631
  • 14
  • 113
  • 118
Arun
  • 116,683
  • 26
  • 284
  • 387
  • 1
    Thanks a lot @Arun ! I just edited your answer slightly to mention the possibility to use `CJ` instead of `expand.grid`. Feel free to keep, delete or modify it. – juba May 13 '13 at 09:37
  • Oh yes indeed @juba, great! I hadn't used that in a while... thanks for reminding me as well. – Arun May 13 '13 at 09:52
1

One other possilibity with match and ddply :

FUN <- function(x) {
y <- x$val[match(c("mon", "tue", "wed", "thu", "fri", "sat", "sun"), x$wday, nomatch=NA)]
y[is.na(y)] <- 0
y <- data.frame(wday=c("mon", "tue", "wed", "thu", "fri", "sat", "sun"), val=y)
y
}
ddply(dt, .(id), FUN)
droopy
  • 2,788
  • 1
  • 14
  • 12
  • Thanks for the `plyr` solution. I think I'll accept the `data.table` one, but this one works perfectly well, too. – juba May 13 '13 at 09:46