0

Rejection Sampling

Im working with rejection sampling with a truncated normal distribution, see r code below. How can I make the sampling stop at a specific n? for example 1000 observations. I.e. I want to stop the sampling when the number of accepted samples has reached n (1000).

Any suggestions? Any help is greatly appreciated :)

#Truncated normal curve    
curve(dnorm(x, mean=2, sd=2)/(1-pnorm(1, mean=2, sd=2)),1,9)

#create a data.frame with 100000 random values between 1 and 9

sampled <- data.frame(proposal = runif(100000,1,9))
sampled$targetDensity <- dnorm(sampled$proposal, mean=2, sd=2)/(1-pnorm(1, mean=2, sd=2))

#accept proportional to the targetDensity

maxDens = max(sampled$targetDensity, na.rm = T)
sampled$accepted = ifelse(runif(100000,0,1) < sampled$targetDensity / maxDens, TRUE, FALSE)

hist(sampled$proposal[sampled$accepted], freq = F, col = "grey", breaks = 100, xlim = c(1,9), ylim = c(0,0.35),main="Random draws from skewed normal, truncated at 1")
curve(dnorm(x, mean=2, sd=2)/(1-pnorm(1, mean=2, sd=2)),1,9, add =TRUE, col = "red", xlim = c(1,9),  ylim = c(0,0.35))



X <- sampled$proposal[sampled$accepted]

How can I set the length of X to a specific number when I sample?

Community
  • 1
  • 1
  • 1
    Do you want to perform all the calculations and then select the first 1,000 that passed, or only do calculations until 1,000 pass? The former is simple (sampled$proposal[sampled$accepted][1:1000]), the latter requires extra steps, as you'll have to check whether you've reached 1,000 passes after each sample. Unless the database is extremely large I don't see that being more efficient. – AodhanOL Jan 04 '18 at 14:16
  • I want to do the latter. Do you know how it could be performed? @AodhanOL – Hans Christensen Jan 04 '18 at 16:02
  • I'll write up a couple of ways to do it. They will be not be nice R code, I think. – AodhanOL Jan 04 '18 at 20:38
  • Is it necessary that you use rejection sampling? There are reasonably easy ways to do it without rejection. – Glen_b Jan 05 '18 at 07:26

1 Answers1

0

After sleeping on it, if you're determined to use rejection sampling and only doing it until 1,000 have passed, I don't think there's a better option than just using a while loop. This is significantly less efficient than

sampled$accepted = ifelse(runif(100000,0,1) < sampled$targetDensity / maxDens, TRUE, FALSE)
X <- sampled$proposal[sampled$accepted][1:1000]

The time taken for the above code is 0.0624001s. The time taken for the code below is 0.780005s. I include it because it is the answer to the specific question you've asked, but the approach is inefficient. If there's another option I'd use that.

#Number of samples
N_Target <- 1000
N_Accepted <- 0

#Loop until condition is met
i = 1
sampled$accepted = FALSE
while( N_Accepted < N_Target ){

    sampled$accepted[i] = ifelse(runif(1,0,1) < sampled$targetDensity[i] / maxDens, TRUE, FALSE)
    N_Accepted = ifelse( sampled$accepted[i], N_Accepted + 1 , N_Accepted )
    i = i + 1
    if( i > nrow( sampled ) ) break

}
AodhanOL
  • 630
  • 7
  • 26
  • I see what you mean with the efficiency. However, the latter option does not seem to return a vector with 1000 observations in my case. When I write length(sampled$accepted) i get 100000. length(sampled$proposal[sampled$accepted]) gives me 755. – Hans Christensen Jan 05 '18 at 14:39
  • Which vector has 1000 observations? – Hans Christensen Jan 05 '18 at 14:39
  • That suggests that only 755 passed the test (which seems unusual). What's the result of sum(sampled$accepted)? For me it's 43364 on the data I created. That's also what I get when I run lenght(sampled$proposal[sampled$accepted]). – AodhanOL Jan 05 '18 at 14:44
  • Well if I change N_Target to 500 I only get a vector of 422 observations. So I'm not sure the loop returns a Vector of 1000 observations, rather it stops when 1000 has been tested, of which about 750 are accepted. I want it to stop when 1000 accepted observations are obtained, not when 1000 are tested. @AodhanOL – Hans Christensen Jan 06 '18 at 15:34
  • That's strange behaviour. I get it too - let me see if I can figure out why. – AodhanOL Jan 06 '18 at 16:41
  • A stupid error on my part, N_Accepted was increasing if accepted was FALSE instead of TRUE. I've edited the answer to be correct. – AodhanOL Jan 06 '18 at 16:44