R data.table generate random pairings in data table

Question

I have the following sample data table.

   id val
1:  a   1
2:  b   3
3:  c   2
4:  d   1

I would like to make random pairings amongst the id columns, however I do not want an id to be paired with itself. What would be the most efficient way to do this with data.tables? One approach I have tried is to first find random rows in a data table as follows

x = x[sample(nrow(x),1),]

but then I hit a block because I would have to run a check to make sure that current index is not present in the one returned. This would be expensive computationally. For example a possible output result would be

  id val id.pair val.pair
1: a  1  b  3
2: b  3  c  2
3: c  2  a  1
4: d  1  a  1

Thanks in advance

Are you sure you have a data.table and not a data.frame? I don't see data.table syntax. — Roland, Aug 18 '13 at 18:04
Yes. Positively. Added the familiar ':' based row numbering. — broccoli, Aug 18 '13 at 18:11

score 3 · Accepted Answer · edited May 23 '17 at 12:23

3

You could use combn and sample.int like this:

df <- read.table(text="id val
a  1
b  3
c  2
d  1", header=TRUE, stringsAsFactors=FALSE)

library(data.table)
dt <- data.table(df)

set.seed(42)
combis <- combn(dt[,id], 2)[,sample.int(choose(nrow(dt),2), nrow(dt))]

setkey(dt, "id")
cbind(dt[combis[1,],], dt[combis[2,],])

#    id val id val
# 1:  c   2  d   1
# 2:  b   3  d   1
# 3:  a   1  c   2
# 4:  a   1  d   1

However, if your number of IDs is big you need something like this function to avoid calculating all possible combinations.

edited May 23 '17 at 12:23

Community

1
1

answered Aug 18 '13 at 18:23

Roland

127,288
10
191
288

Thanks. I was hoping there was a simpler way. Exploring setdiff – broccoli Aug 18 '13 at 18:49

Frank · Answer 2 · 2015-10-31T15:54:24.073

2

Here's another way:

set.seed(1)
DT[, paste0("pair.",names(DT)) := .SD[ sapply(.I, function(i) sample(.I[-i], 1)) ]]

which gives...

   id val pair.id pair.val
1:  a   1       b        3
2:  b   3       c        2
3:  c   2       b        3
4:  d   1       c        2

edited Oct 31 '15 at 15:54

answered Aug 19 '13 at 15:36

Frank

66,179
8
96
180

R data.table generate random pairings in data table

2 Answers2