2

So what I'm trying to achieve is this : Say I have a data table dt having (say) 4 columns. I want to get unique length of every combination of 2 columns.

DT <- data.table(a = 1:10, b = c(1,1,1,2,2,3,4,4,5,5), c = letters[1:10], d = c(3,3,5,2,4,2,5,1,1,5))

> DT
    a b c d
1:  1 1 a 3
2:  2 1 b 3
3:  3 1 c 5
4:  4 2 d 2
5:  5 2 e 4
6:  6 3 f 2
7:  7 4 g 5
8:  8 4 h 1
9:  9 5 i 1
10: 10 5 j 5

I tried the following code :

cols <- colnames(DT)
for(i in 1:(length(cols)-1)) {
for (j in i+1:length(cols)) {
    print(unique(DT[,.SD, .SDcols = c(cols[i],cols[j])]))
     }   
   }

Here, basically 'i' goes from first column to second last whereas 'j' is the combining column with 'i'. So the combinations I get are : ab, ac, ad, bc, bd, cd.

But it gives me the following error

Error in [.data.table(DT, , .SD, .SDcols = c(cols[i], cols[j])) : .SDcols missing at the following indices: [2]

If someone can explain why this is and a way around it, I'll be really grateful. Thanks.

Waldi
  • 39,242
  • 6
  • 30
  • 78

1 Answers1

2

This is due to operators precedence, : is evaluated before +:

1+1:length(cols)
[1] 2 3 4 5
> (1+1):length(cols)
[1] 2 3 4

Correct loop is :

for(i in 1:(length(cols)-1)) {
  for (j in (i+1):length(cols)) {
    print(unique(DT[,.SD, .SDcols = c(cols[i],cols[j])]))
  }   
}
Waldi
  • 39,242
  • 6
  • 30
  • 78