1

I'm trying to answer the following question using a simple Monte Carlo sampling procedure in R: An urn contains 10 balls. Two red, three white, and five black. All 10 are drawn one at a time without replacement. Find the probability that both the first and last balls drawn are black.

I've tried two approaches and neither work.

Here's the longer approach that's more intuitive to me:

balls <- c(1:10) #Consider 1-5 black, 6-8 white, and 9-10 red.

pick.ball <- function(balls){
sample(x = balls, 1, replace = FALSE)
}

experiment <- function(n){
picks = NULL
keep <- NULL
for(j in 1:n){
   for(i in 1:10){
   picks[i] <- pick.ball(balls = balls)
   }
keep[j] <- ifelse(picks[1] == any(1:5) & picks[10] == any(1:5), 1, 0)
 }
return(length(which(keep == 1))/n)
}

Here's my second, simpler approach which shows my lack of understanding of the repeat loop. Don't bother running it--it just goes on forever. But if someone could help me better understand why, that would be appreciated!

balls <- c(1:10) #Consider 1-5 black, 6-8 white, and 9-10 red.

pick.ball <- function(balls, n){
  keep = NULL
  for(i in 1:n){
  picks <- sample(x = balls, 10, replace = FALSE)
  keep[i] <- ifelse(picks[1] == any(1:5) & picks[10] == any(1:5), 1, 0)
  repeat{
    picks
    if(length(keep) == n){
      break
      }
    }
  }
  return(which(keep == 1)/n)
}
Michael
  • 111
  • 9

1 Answers1

2

Here is a loop that I created. You can wrap it up in a function if you want. Instead of numbering the balls, I am using letters.

urn <- c(rep("B", 5), rep("W", 3), rep("R", 2))

# Set the number of times you want to run the loop

nloops <- 10000


# Create an empty data frame to hold the outcomes of the simulation

m <- structure(list(first = character(),
                    last = character(),
                    is_black = integer()),
               class = "data.frame")

Now run the loop


set.seed(456)
for (j in 1:nloops) {
  b <- sample(urn, 10, replace = FALSE)
  m[j, 1:2 ] <- b[c(1, 10)] 
  m[j, 3] <- ifelse(m[j, 1] == "B" & m[j, 2] == "B", 1, 0)
}

head(m)
  first last is_black
1     B    W        0
2     B    B        1
3     B    B        1
4     R    B        0
5     B    R        0
6     R    W        0

Finally, the answer:

# Proportion of cases where first and last ball drawn were black

sum(m[ , 3]) / nloops

# This was 0.22
  • I understood what you did there. Very straight forward and elegant. Could you help explain where my first attempt failed? You're storing the samples in a data frame. I am just trying to store the desired outcomes in a vector. There's a difference there where your idea works and mine doesn't, and I'm just wondering why that is?? – Michael Apr 15 '19 at 02:26
  • In your first attempt, `balls` is not getting updated. After every draw, `balls` should reduce in length because you are sampling without replacement. I modified your loops by adding `balls2` as the vector to draw from. ` for(j in 1:n){ balls2 <- balls for(i in 1:10){ picks[i] <- pick.ball(balls = balls2) balls2 <- balls2[balls2 != picks[i]] } keep[j] <- ifelse(picks[1] == any(1:5) & picks[10] == any(1:5), 1, 0) }` – Ashwin Malshe Apr 15 '19 at 14:55
  • I did fix that at some point. I thought I had posted the fixed version. Oops. When I did fix that though, the j loop still wasn't working. The keep vector wasn't filling up with any results from picks i loop. Any idea why? – Michael Apr 15 '19 at 15:30