0

I have a similar question like this:

Weighted sampling with 2 vectors

I now have a dataset which contains 1000 observations and 4 columns for each observation. I want to sample 200 observations from the original dataset with replacement.

But the PROBLEM is: I need to assign different probability vector for each column. For example, for the first column. I want equal probability c(0.001,0.001,0.001,0.001...). For the second column, I want something different like c(0.0005,0.0002,......). Of course, each probability vector sum up to 1.

I know sample can do with one vector. But I am not sure about other commands. Please HELP me!

Thank you in advance! Colamonkey

Community
  • 1
  • 1

1 Answers1

1

data frame with sample probabilities

# in your case the rows are 1000 and the columns 4, 
# but it is just to show the procedure
samp_prob <- data.frame(A = rep(.25, 4), B = c(.5, .1, .2, .2), C = c(.3, .6, .05, .05))

data frame of values to sample from with replacement

df <- data.frame(a = 1:4, b = 2:5, c = 3:6)

sampling

sam <- mapply(function(x, y) sample(x, 200, T, y), df, samp_prob)
head(sam)
     a b c
[1,] 4 5 6
[2,] 1 2 4
[3,] 1 2 4
[4,] 4 4 4
[5,] 4 4 4
[6,] 1 2 4

# you can also write (it is equivalent):
mapply(df, samp_prob, FUN = sample, size = 200, replace = T)
Davide Passaretti
  • 2,741
  • 1
  • 21
  • 32