0

I have a data frame with 4 columns. I am trying to shuffle two columns of the data frame together such that those two columns are always related.

I have tried 'sample' function, but it it limited to one column of data frame only.


data = data.frame(label=letters[1:5], label2=letters[1:15], number=11:15)
data = within(data, numbersq <- (number*number))

# lable lable2 number numbersq
#   a     a      11     121
#   b     b      12     144
#   c     c      13     169
#   d     d      14     196
#   e     e      15     225

#Now, I want to twick the data something like, columns 'lable' and 'lable2' remains as it is and columns 'number' and 'numbersq' should shufffle. 
#As you can see in the desired output,'number' and 'numbersq' should shuffled together not separately.

#Desired Output

# lable lable2 number numbersq
#   a     a      15     225
#   b     b      13     169
#   c     c      14     196
#   d     d      12     144
#   e     e      11     121

I have tried he following code but seems it shuffles the columns separately.

data_2 = data.frame(data_2$label, data_2$label2, sample(data_2$number), sample(data_2$numbersq))

  • In general, how can apply sample function of R on two columns together such that they do not loose their relationship with each other. – Gaurav Kothari Oct 28 '19 at 22:25

2 Answers2

0

Take a sample of the rows, for example if you want a sample of 5 rows

set.seed(1)
row_sample <- sample(1:nrow(data),5)
data[row_sample,]
#  label lable2 number numbersq
#7     g      g     17      289
#2     b      b     12      144
#3     c      c     13      169
#8     h      h     18      324
#1     a      a     11      121
fmarm
  • 4,209
  • 1
  • 17
  • 29
0

Thank you so much for the suggestions. Finally I got the solution. The code is as below. I believe the code can be still optimized.


data <- data.frame(label=letters[1:5], lable2=letters[1:5], number=11:15)
data = within(data, numbersq <- (number*number))
print(data)

# lable lable2 number numbersq
#   a     a      11     121
#   b     b      12     144
#   c     c      13     169
#   d     d      14     196
#   e     e      15     225


data_2a = data[,1:2]
data_2b = data[,3:4]
data_2b_samp = data_2b[sample(nrow(data_2b)), ]

data_3 = cbind(data_2a, data_2b_samp)

print(data_3)

# lable lable2 number numbersq
#   a     a      15     225
#   b     b      13     169
#   c     c      14     196
#   d     d      12     144
#   e     e      11     121