R - Sampling Two Correlated Variables

Question

I have two multinomial variables (e.g. age group and color).

ageGroup <- c(35,40,45,50)
color    <- c("Red", "Blue", "Yellow")

I want to be able to draw these two variables for 100 observations with equal probability.

n = 100
age   <- sample(ageGroup, 100, replace = T)
color <- sample(color,    100, replace = T)

If we assume that some observed frequency table shows that ages 35 and 40 cannot also be 'red', how do I sample where these two age groups would have equal probability of drawing 'blue' and 'yellow' (and not 'red')?

Should I split the sampling in age groups or is there a more sophisticated statistical approach?

Thanks!

score 2 · Answer 1 · answered Oct 05 '18 at 00:45

Here's one approach. I'm not sure if it meets your "with equal probability" requirement. The way I've set this up is that each "allowable" combination of color-ageGroup will be drawn with equal probability.

# sample data
ageGroup <- c(35,40,45,50)
color    <- c("Red", "Blue", "Yellow")

# get all combinations of ageGroup and color
df <- expand.grid(ageGroup, color)
names(df) <- c("ageGroup", "color")

# remove red-35 and red-40
subdf <- df[!(df$color=="Red" & df$ageGroup %in% c(35, 40)), ]

# sample from the remaining combinations, each with equal probability
N <- nrow(subdf)
result <- subdf[sample(1:N, 100, T), ]

R - Sampling Two Correlated Variables

1 Answers1