3

When dcasting a data.table, if an id falls under multiple categories / fields, it returns the length of values by default.

dta <- (data.table(ID = c("A", "C", "B", "A", "D", "D", "A", "B", "D", "D"), 
                   CATEGORY_DUPLICATE = c(LETTERS[17:23], LETTERS[20:22]),
                   CATEGORY_UNIQUE = c(LETTERS[17:23], c("T", "U.1", "V.1")), 
                   VALUE = c(400, 400, 500, 300, 500, 100, 200, 300, 400, 500)))

dcast(dta, ID ~ CATEGORY_DUPLICATE, value.var = "VALUE")

ID  Q   R   S   T   U   V   W
A   1   0   0   1   0   0   1
B   0   0   1   1   0   0   0
C   0   1   0   0   0   0   0
D   0   0   0   0   2   2   0

If one id corresponds to no more than one category, only then it returns the actual value itself. I have appended a 1 to U and V column in the sample table to demonstrate this.

dcast(dta, ID ~ CATEGORY_UNIQUE, value.var = "VALUE")

ID  Q   R   S   T   U   U.1 V   V.1 W
A   400 NA  NA  300 NA  NA  NA  NA  200
B   NA  NA  500 300 NA  NA  NA  NA  NA
C   NA  400 NA  NA  NA  NA  NA  NA  NA
D   NA  NA  NA  NA  500 400 100 500 NA

I am looking to achieve this result within the dcast query itself - such that multiple groups for an id generates multiple columns with actual values in them - instead of the length of category for the id.

Please help

Sanjid
  • 67
  • 6

2 Answers2

2

Perhaps creating unique columns prior to dcasting is an option?

dcast(
    dt[, CATEGORY_UNIQUE := paste0(CATEGORY, 1:.N), by = CATEGORY],
    ID ~ CATEGORY_UNIQUE, value.var = "VALUE")
#   ID  Q1  R1  S1  T1  T2  U1  U2  V1  V2  W1
#1:  A 400  NA  NA 300  NA  NA  NA  NA  NA 200
#2:  B  NA  NA 500  NA 300  NA  NA  NA  NA  NA
#3:  C  NA 400  NA  NA  NA  NA  NA  NA  NA  NA
#4:  D  NA  NA  NA  NA  NA 500 400 100 500  NA

Sample data

library(data.table)
dt <- (data.table(
    ID = c("A", "C", "B", "A", "D", "D", "A", "B", "D", "D"),
    CATEGORY = c(LETTERS[17:23], LETTERS[20:22]),
    VALUE = c(400, 400, 500, 300, 500, 100, 200, 300, 400, 500)))
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • 1
    alternatively, we can do: `dcast(dta, ID ~ CATEGORY_DUPLICATE + rowid(CATEGORY_DUPLICATE), value.var = 'VALUE')` – mt1022 Sep 06 '19 at 05:57
2

You can also do:

dcast(dta[, CATEGORY_UNIQUE := make.unique(CATEGORY_DUPLICATE), by = ID],
      ID ~ CATEGORY_UNIQUE, value.var = "VALUE")

   ID   Q   R   S   T   U U.1   V V.1   W
1:  A 400  NA  NA 300  NA  NA  NA  NA 200
2:  B  NA  NA 500 300  NA  NA  NA  NA  NA
3:  C  NA 400  NA  NA  NA  NA  NA  NA  NA
4:  D  NA  NA  NA  NA 500 400 100 500  NA
tmfmnk
  • 38,881
  • 4
  • 47
  • 67