Generating random number by length of blocks of data in R data frame

Question

I am trying to simulate n times the measuring order and see how measuring order effects my study subject. To do this I am trying to generate integer random numbers to a new column in a dataframe. I have a big dataframe and i would like to add a column into the dataframe that consists a random number according to the number of observations in a block.

Example of data(each row is an observation):

df <- data.frame(A=c(1,1,1,2,2,3,3,3,3), 
                 B=c("x","b","c","g","h","g","g","u","l"), 
                 C=c(1,2,4,1,5,7,1,2,5))


  A B C
1 1 x 1
2 1 b 2
3 1 c 4
4 2 g 1
5 2 h 5
6 3 g 7
7 3 g 1
8 3 u 2
9 3 l 5

What I'd like to do is add a D column and generate random integer numbers according to the length of each block. Blocks are defined in column A.

Result should look something like this:

df <- data.frame(A=c(1,1,1,2,2,3,3,3,3), 
                 B=c("x","b","c","g","h","g","g","u","l"), 
                 C=c(1,2,4,1,5,7,1,2,5),
                 D=c(2,1,3,2,1,4,3,1,2))

> df
  A B C D
1 1 x 1 2
2 1 b 2 1
3 1 c 4 3
4 2 g 1 2
5 2 h 5 1
6 3 g 7 4
7 3 g 1 3
8 3 u 2 1
9 3 l 5 2

I have tried to use R:s sample() function to generate random numbers but my problem is splitting the data according to block length and adding the new column. Any help is greatly appreciated.

Welcome to SO, and well done for providing a reproducible example. — Richie Cotton, Jan 06 '12 at 13:05

John · Answer 1 · 2012-01-06T13:29:25.253

4

It can be done easily with ave

df$D <- ave( df$A, df$A, FUN = function(x) sample(length(x)) )

(you could replace length() with max(), or whatever, but length will work even if A is not numbers matching the length of their blocks)

edited Jan 06 '12 at 13:29

answered Jan 06 '12 at 13:23

John

23,360
7
57
83

Nice demonstration of `ave()`'s capabilities. It's a real pity that such as useful function has such a terribly misleading name. Even the documentation at `?ave` is not much better at communicating what it really does. – Josh O'Brien Jan 06 '12 at 16:55
yes, it's a bit of a shame... I'm guessing they were originally going with (ave)rage and that's how it works when you don't specify `FUN`. – John Jan 06 '12 at 22:13
This one's a great solution too! Thx for help! – Markus Korhonen Jan 07 '12 at 08:20

Richie Cotton · Accepted Answer · 2012-01-06T13:31:38.663

2

This is really easy with ddply from plyr.

ddply(df, .(A), transform, D = sample(length(A)))

The longer manual version is:

Use split to split the data frame by the first column.

split_df <- split(df, df$A)

Then call sample on each member of the list.

split_df <- lapply(split_df, function(df) 
{
  df$D <- sample(nrow(df))
  df
})

Then recombine with

df <- do.call(rbind, split_df)

edited Jan 06 '12 at 13:31

answered Jan 06 '12 at 12:58

Richie Cotton

118,240
47
247
360

@MarkusKorhonen If an answer is working well for you, click the checkmark next to it to "accept" it and indicate that you're not still hoping for more answers. – Gregor Thomas Jan 06 '12 at 17:36
@shujaa Sry, I'm new here. Thx for the tip. I marked an answer. – Markus Korhonen Jan 06 '12 at 22:36
@MarkusKorhonen, no worries, welcome to the site! And, as Richie already commented, nice work asking a high-quality first question. – Gregor Thomas Jan 06 '12 at 23:26

score 1 · Answer 3 · answered Jan 06 '12 at 13:24

1

One simple way:

df$D = 0

counts = table(df$A)

for (i in 1:length(counts)){
    df$D[df$A == names(counts)[i]] = sample(counts[i])
}

answered Jan 06 '12 at 13:24

David Robinson

77,383
16
167
187

Generating random number by length of blocks of data in R data frame

3 Answers3