I'm a newbie in R so just bear with me.
So I'm trying to perform stratified sampling in such a way that, it will use a 2 column strata but with both columns satisfying specific values.
This is my code:
library(splitstackshape)
set.seed(1)
dat1 <- data.frame(ID = 1:100,
A = sample(c("AA", "BB", "CC", "DD", "EE"), 100, replace = TRUE),
B = sample(c(30,40,50),100,replace = TRUE), C = sample(c(1:10),100,replace = TRUE),
D = sample(c("CA", "NY", "TX"), 100, replace = TRUE),
E = sample(c("M", "F"), 100, replace = TRUE))
stratified(dat1, c("B", "C"), 0.1, select = list(B = 30, C = c(8:10)))
To my understanding this function first generates a strata of size 10% and from that it selects those records that satisfies the condition B=30 and c between 8 and 10.
As a result the size of the strata gets reduced from the initial 10%.
What my question is that, is there any way that will generate a strata which consists of records in which column B is having value 30 while column C can have values between 8 and 10 with the nrow()
of the resultant sample being 10% of the original data frame?
I'm using stratified()
from "splitstackshape". If stratified()
cannot handle this, are there any other packages out there that can perform this kind of operation?