4

Designing my stratified sample

library(survey)
design <- svydesign(id=~1,strata=~Category,  data=billa, fpc=~fpc)

So far so good, but how can I draw now a sample in the same way I was able for simple sampling?

set.seed(67359)  
samplerows <- sort(sample(x=1:N, size=n.pre$n))
Andrie
  • 176,377
  • 47
  • 447
  • 496
Roland Kofler
  • 1,332
  • 1
  • 16
  • 33

4 Answers4

4

If you have a stratified design, then I believe you can sample randomly within each stratum. Here is a short algorithm to do proportional sampling in each stratum, using ddply:

library(plyr)
set.seed(1)
dat <- data.frame(
    id = 1:100,
    Category = sample(LETTERS[1:3], 100, replace=TRUE, prob=c(0.2, 0.3, 0.5))
)

sampleOne <- function(id, fraction=0.1){
  sort(sample(id, round(length(id)*fraction)))
}

ddply(dat, .(Category), summarize, sampleID=sampleOne(id, fraction=0.2))

   Category sampleID
1         A       21
2         A       29
3         A       72
4         B       13
5         B       20
6         B       42
7         B       58
8         B       82
9         B      100
10        C        1
11        C       11
12        C       14
13        C       33
14        C       38
15        C       40
16        C       63
17        C       64
18        C       71
19        C       92
Andrie
  • 176,377
  • 47
  • 447
  • 496
  • Thank you, I wished R would provide such a function. The frustration is that I always have to discover what I need to fix for my own and what R provides readily. I am not a statistician so I always fear to make mistakes – Roland Kofler Oct 31 '11 at 11:55
  • Nice one! `sampleOne` is the function I was looking for. `tapply(dat$id, dat$Category, sampleOne, fraction = 0.1)` will work very well with small samples too. – hpesoj626 Feb 03 '18 at 06:47
4

Take a look at the sampling package on CRAN (pdf here), and the strata function in particular.

This is a good package to know if you're doing surveys; there are several vignettes available from its page on CRAN.

The task view on "Official Statistics" includes several topics that are closely related to these issues of survey design and sampling - browsing through it and the packages recommended may also introduce other tools that you can use in your work.

Iterator
  • 20,250
  • 12
  • 75
  • 111
3

You can draw a stratified sample using dplyr. First we group by the column or columns in which we are interested in. In our example, 3 records of each Species.

library(dplyr)
set.seed(1)
iris %>%
  group_by (Species) %>%
  sample_n(., 3)

Output:

Source: local data frame [9 x 5]
Groups: Species

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1          4.3         3.0          1.1         0.1     setosa
2          5.7         3.8          1.7         0.3     setosa
3          5.2         3.5          1.5         0.2     setosa
4          5.7         3.0          4.2         1.2 versicolor
5          5.2         2.7          3.9         1.4 versicolor
6          5.0         2.3          3.3         1.0 versicolor
7          6.5         3.0          5.2         2.0  virginica
8          6.4         2.8          5.6         2.2  virginica
9          7.4         2.8          6.1         1.9  virginica
mpalanco
  • 12,960
  • 2
  • 59
  • 67
2

here's a quick way to sample three records per distinct 'carb' value from the mtcars data frame without replacement

# choose how many records to sample per unique 'carb' value
records.per.carb.value <- 3

# draw the sample
your.sample <- 
    mtcars[ 
        unlist( 
            tapply( 
                1:nrow( mtcars ) , 
                mtcars$carb , 
                sample , 
                records.per.carb.value 
            ) 
        ) , ]

# print the results to the screen
your.sample

note that the survey package is mostly used for analyzing complex sample survey data, not creating it. @Iterator is right that you should check out the sampling package for more advanced ways to create complex sample survey data. :)

Anthony Damico
  • 5,779
  • 7
  • 46
  • 77