How to take sample from two columns of R data frame jointly?

Question

I have a data frame with 4 columns. I am trying to shuffle two columns of the data frame together such that those two columns are always related.

I have tried 'sample' function, but it it limited to one column of data frame only.


data = data.frame(label=letters[1:5], label2=letters[1:15], number=11:15)
data = within(data, numbersq <- (number*number))

# lable lable2 number numbersq
#   a     a      11     121
#   b     b      12     144
#   c     c      13     169
#   d     d      14     196
#   e     e      15     225

#Now, I want to twick the data something like, columns 'lable' and 'lable2' remains as it is and columns 'number' and 'numbersq' should shufffle. 
#As you can see in the desired output,'number' and 'numbersq' should shuffled together not separately.

#Desired Output

# lable lable2 number numbersq
#   a     a      15     225
#   b     b      13     169
#   c     c      14     196
#   d     d      12     144
#   e     e      11     121

I have tried he following code but seems it shuffles the columns separately.

data_2 = data.frame(data_2$label, data_2$label2, sample(data_2$number), sample(data_2$numbersq))

In general, how can apply sample function of R on two columns together such that they do not loose their relationship with each other. — Gaurav Kothari, Oct 28 '19 at 22:25

score 0 · Answer 1 · answered Oct 28 '19 at 22:43

0

Take a sample of the rows, for example if you want a sample of 5 rows

set.seed(1)
row_sample <- sample(1:nrow(data),5)
data[row_sample,]
#  label lable2 number numbersq
#7     g      g     17      289
#2     b      b     12      144
#3     c      c     13      169
#8     h      h     18      324
#1     a      a     11      121

answered Oct 28 '19 at 22:43

fmarm

4,209
1
17
29

I guess my question was not clear. I have updated the question. It would be great if you could provide the solution. – Gaurav Kothari Oct 29 '19 at 03:15
I think he wants: `cbind(data[,1:2], data[row_sample,3:4])` – qdread Oct 29 '19 at 14:12

score 0 · Answer 2 · answered Oct 29 '19 at 16:41

Thank you so much for the suggestions. Finally I got the solution. The code is as below. I believe the code can be still optimized.


data <- data.frame(label=letters[1:5], lable2=letters[1:5], number=11:15)
data = within(data, numbersq <- (number*number))
print(data)

# lable lable2 number numbersq
#   a     a      11     121
#   b     b      12     144
#   c     c      13     169
#   d     d      14     196
#   e     e      15     225


data_2a = data[,1:2]
data_2b = data[,3:4]
data_2b_samp = data_2b[sample(nrow(data_2b)), ]

data_3 = cbind(data_2a, data_2b_samp)

print(data_3)

# lable lable2 number numbersq
#   a     a      15     225
#   b     b      13     169
#   c     c      14     196
#   d     d      12     144
#   e     e      11     121

How to take sample from two columns of R data frame jointly?

2 Answers2