0

I am trying to do sampling with content balancing using the base function. But how do you ensure that at least one row with either group 'a', or 'b' gets selected?

a <- cbind(matrix(1:36,ncol=3),rbind(as.matrix(rep('a',each=10)),as.matrix(rep('b', each=2))))

b <- 1:5
for (i in b){
  draw <- sample(nrow(a),1)
   a<- a[-draw,] #minus that row.
  }
 a

Using this approach I may or may not get 'b'. How do make sure that a row from group b is always picked at least once?

Sam
  • 261
  • 2
  • 12
  • stratified sampling: sample from each group separately, selecting each sub-sample according to some rule (eg, 90% group a and 10% group b). – lmo Dec 15 '16 at 13:25
  • And you can get stratified sampling from the function strata in the sampling package – G5W Dec 15 '16 at 13:31

1 Answers1

0

It's a very basic solution, not very pretty but I tried sticking to base functions. This will return a sample of size b containing at least one row from a for which a[,4] == "b"

Edit: updated to use only base functions as requested and to work for both situations where at least one "a" needs to be drawn and at least one "b" needs to be drawn

a <- data.frame(matrix(1:36,ncol=3),rbind(as.matrix(rep('a',each=10)),as.matrix(rep('b', each=2))))
names(a) <- c("X1","X2","X3","X4")

b <- 5
a2 <- data.frame()

for (i in b){
  draw <- sample(1:nrow(a),b-1,replace = F) # draw a sample of size b-1
  a2<- a[draw,]         # store rows in a2
  a3<- a[-draw,]        # store rest in a3
  if(sum(a2[,4]=="b") == 0){ # if a2 has no "b" in column 4
    # draw 1 value from rownames containing "b" in fourth column and append to draw, store in draw2
    draw2 <- c(draw,sample(rownames(a[which(a$X4=="b"),]),1,replace = F)) 
    # else draw one random row from rownames not in a but not in a2
  }else{
  if(sum(a2[,4]=="a") == 0){ # if a2 has no "a" in column 4
    # draw 1 value from rownames containing "a" in fourth column and append to draw, store in draw2
    draw2 <- c(draw,sample(rownames(a[which(a$X4=="a"),]),1,replace = F)) 
    # else draw one random row from rownames not in a but not in a2
  } 
    else {draw2 <- c(draw,sample(rownames(a3),1,replace = F))}}
  a2<- a[draw2,] # pick these rows
}
a2
Niek
  • 1,594
  • 10
  • 20