4

I am trying to simulate n times the measuring order and see how measuring order effects my study subject. To do this I am trying to generate integer random numbers to a new column in a dataframe. I have a big dataframe and i would like to add a column into the dataframe that consists a random number according to the number of observations in a block.

Example of data(each row is an observation):

df <- data.frame(A=c(1,1,1,2,2,3,3,3,3), 
                 B=c("x","b","c","g","h","g","g","u","l"), 
                 C=c(1,2,4,1,5,7,1,2,5))


  A B C
1 1 x 1
2 1 b 2
3 1 c 4
4 2 g 1
5 2 h 5
6 3 g 7
7 3 g 1
8 3 u 2
9 3 l 5

What I'd like to do is add a D column and generate random integer numbers according to the length of each block. Blocks are defined in column A.

Result should look something like this:

df <- data.frame(A=c(1,1,1,2,2,3,3,3,3), 
                 B=c("x","b","c","g","h","g","g","u","l"), 
                 C=c(1,2,4,1,5,7,1,2,5),
                 D=c(2,1,3,2,1,4,3,1,2))

> df
  A B C D
1 1 x 1 2
2 1 b 2 1
3 1 c 4 3
4 2 g 1 2
5 2 h 5 1
6 3 g 7 4
7 3 g 1 3
8 3 u 2 1
9 3 l 5 2

I have tried to use R:s sample() function to generate random numbers but my problem is splitting the data according to block length and adding the new column. Any help is greatly appreciated.

joran
  • 169,992
  • 32
  • 429
  • 468

3 Answers3

4

It can be done easily with ave

df$D <- ave( df$A, df$A, FUN = function(x) sample(length(x)) )

(you could replace length() with max(), or whatever, but length will work even if A is not numbers matching the length of their blocks)

John
  • 23,360
  • 7
  • 57
  • 83
  • Nice demonstration of `ave()`'s capabilities. It's a real pity that such as useful function has such a terribly misleading name. Even the documentation at `?ave` is not much better at communicating what it really does. – Josh O'Brien Jan 06 '12 at 16:55
  • yes, it's a bit of a shame... I'm guessing they were originally going with (ave)rage and that's how it works when you don't specify `FUN`. – John Jan 06 '12 at 22:13
  • This one's a great solution too! Thx for help! – Markus Korhonen Jan 07 '12 at 08:20
2

This is really easy with ddply from plyr.

ddply(df, .(A), transform, D = sample(length(A)))

The longer manual version is:

Use split to split the data frame by the first column.

split_df <- split(df, df$A)

Then call sample on each member of the list.

split_df <- lapply(split_df, function(df) 
{
  df$D <- sample(nrow(df))
  df
})

Then recombine with

df <- do.call(rbind, split_df)
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
1

One simple way:

df$D = 0

counts = table(df$A)

for (i in 1:length(counts)){
    df$D[df$A == names(counts)[i]] = sample(counts[i])
}
David Robinson
  • 77,383
  • 16
  • 167
  • 187