0

I have an R script that allows me to select a sample size and take fifty individual random samples with replacement. Below is an example of this code:

## Creates data frame
df = as.data.table(data)

## Select sample size
sample.size = 5

## Creates Sample 1 (Size 5)

    Sample.1<-df[, 
     Dollars[sample(.N, size=sample.size, replace=TRUE)], by = Num]
    Sample.1$Sample <- c("01")

According to the R script above, I first created a data frame. I then select my sample size, which in this case is 5. This represents just one sample. Due to my lack of experience with R, I repeat this code 49 more times. The last piece of code looks like this:

## Creates Sample 50 (Size 5)

   Sample.50<-df[, 
     Dollars[sample(.N, size=sample.size, replace=TRUE)], by = Num]
   Sample.50$Sample <- c("50")

The sample output would look something like this (Sample Range 1 - 50):

Num  Dollars   Sample
  1    85000       01
  1     4900       01
  1    18000       01
  1     6900       01
  1    11000       01
  1     8800       50
  1     3800       50
  1    10400       50
  1     2200       50
  1    29000       50

It should be noted that varaible 'Num' was created for grouping purposes and has little to no influence on my overall question (which is posted below).

Instead of repeating this code fifty times, to get me fifty individual samples (with a size of 5), is there a loop I can create to help me limit my code? I have been recently asked to create ten thousand random samples, each of a size of 5. I obviously cannot repeat this code ten thousand times so I need some sort of loop.

A sample of my final output should look something like this (Sample Range 1 - 10,000):

Num  Dollars   Sample
  1    85000       01
  1     4900       01
  1    18000       01
  1     6900       01
  1    11000       01
  1     9900    10000
  1     8300    10000
  1    10700    10000
  1     6800    10000
  1    31000    10000

Thank you all in advance for your help, its greatly appreciated.

Here is some sample code if needed:

Num Dollars
1   31002
1   13728
1   23526
1   80068
1   86244
1   9330
1   27169
1   13694
1   4781
1   9742
1   20060
1   35230
1   15546
1   7618
1   21604
1   8738
1   5299
1   12081
1   7652
1   16779
YimYames
  • 99
  • 1
  • 12
  • @beginneR, sorry I'm trying to provide something useful for you to use as an example set. I want, for example, 10,000 random samples (each with a sample size of 5) using this data set. The only variable of interest is dollars, pay not attention to 'Num'. Does this help? – YimYames Jul 28 '14 at 18:46

2 Answers2

2

A very simple method would be to use a for loop and store the results in a list:

lst <- list()

for(i in seq_len(3)){
  lst[[i]] <- df[sample(seq_len(nrow(df)), 5, replace = TRUE),]
  lst[[i]]["Sample"] <- i
}

> lst
[[1]]
     Num Dollars Sample
20     1   16779      1
1      1   31002      1
12     1   35230      1
14     1    7618      1
14.1   1    7618      1

[[2]]
     Num Dollars Sample
9      1    4781      2
13     1   15546      2
12     1   35230      2
17     1    5299      2
12.1   1   35230      2

[[3]]
   Num Dollars Sample
1    1   31002      3
7    1   27169      3
17   1    5299      3
5    1   86244      3
6    1    9330      3

Then, to create a single data.frame, use do.call to rbind the list elements together:

do.call(rbind, lst)
     Num Dollars Sample
20     1   16779      1
1      1   31002      1
12     1   35230      1
14     1    7618      1
14.1   1    7618      1
9      1    4781      2
13     1   15546      2
121    1   35230      2
17     1    5299      2
12.1   1   35230      2
11     1   31002      3
7      1   27169      3
171    1    5299      3
5      1   86244      3
6      1    9330      3
talat
  • 68,970
  • 21
  • 126
  • 157
  • just one more thing. The script works fine when I dont include the following piece of code `lst[[i]]["Sample"] <- i`, I do however need this piece. Any suggestions? – YimYames Jul 28 '14 at 19:39
  • What happens when you include it? What error message? – talat Jul 28 '14 at 19:42
  • Here is ther error I receive "error in `[.data.table`(x, i, which = TRUE) : When i is a data.table (or character vector), x must be keyed (i.e. sorted, and, marked as sorted) so data.table knows which columns to join to and take advantage of x being sorted. Call setkey(x,...) first, see ?setkey." – YimYames Jul 28 '14 at 19:45
  • Well, is there a reason you use a `data.table` instead of a `data.frame`? (Note that in your question you write that you create a data.frame but in fact create a data.table). Unfortunately I'm not a data.table expert.. – talat Jul 28 '14 at 19:50
  • I am all set now. Thanks agian for everything. – YimYames Jul 28 '14 at 20:16
1

It's worth noting that if you're sampling with replacement, then drawing 50 (or 10,000) samples of size 5 is equivalent to drawing one sample of size 250 (or 50,000). Thus I would do it like this (you'll see I stole a line from @beginneR's answer):

df = as.data.table(data)

## Select sample size
sample.size = 5
n.samples = 10000

# Sample and assign groups
draws <- df[sample(seq_len(nrow(df)), sample.size * n.samples, replace = TRUE), ]
draws[, Sample := rep(1:n.samples, each = sample.size)]
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • thanks a lot for your input. The code works great and is very conceptual. The 'Sample' variable also appears in my final output. – YimYames Jul 28 '14 at 19:49