6

I would like to apply the Rejection sampling method to simulate a random vector Y=(Y_1, Y_2) of a uniform distribution from a unit disc D = { (X_1 , X_2) \in R^2: \sqrt{x^2_1 + x^2_2} ≤ 1} such that X = (X_1 , X_ 2) is random vector of a uniform distribution in the square S = [−1, 1]^2 and the joint density f(y_1,y_2) = \frac{1}{\pi} 1_{D(y_1,y_2)}.

In the rejection method, we accept a sample generally if f(x) \leq C * g(x). I am using the following code to :

x=runif(100,-1,1)
y=runif(100,-1,1)

d=data.frame(x=x,y=y)
disc_sample=d[(d$x^2+d$y^2)<1,]
plot(disc_sample)

I have two questions:

{Using the above code, logically, the size of d should be greater than the size of disc_sample but when I call both of them I see there are 100 elements in each one of them. How could this be possible. Why the sizes are the same.} THIS PART IS SOLVED, thanks to the comment below.

The question now

Also, how could I reformulate my code to give me the total number of samples needed to get 100 samples follow the condition. i.e to give me the number of samples rejected until I got the 100 needed sample?

Thanks to the answer of r2evans but I am looking to write something simpler, a while loop to store all possible samples inside a matrix or a data frame instead of a list then to call from that data frame just the samples follow the condition. I modified the code from the answer without the use of the lists and without sapply function but it is not giving the needed result, it yields only one row.

i=0
samps <- data.frame()
goods <- data.frame()
nr <- 0L
sampsize <- 100L
needs <- 100L
while (i < needs) {
  samps <- data.frame(x = runif(1, -1, 1), y = runif(1, -1, 1))
  goods <- samps[(samps$x^2+samps$y^2)<1, ]
i = i+1
}

and I also thought about this:

i=0
j=0
samps <- matrix()
goods <- matrix()
needs <- 100

while (j < needs) {
  samps[i,1] <- runif(1, -1, 1)
  samps[i,2] <- runif(1, -1, 1)
  if (( (samps[i,1])**2+(samps[i,2])**2)<1){
  goods[j,1] <- samps[i,1]
  goods[j,2] <- samps[i,2]
}
else{
i = i+1
}
}

but it is not working.

I would be very grateful for any help to modify the code.

  • Do you see 100 elements in them, or do you see a row named `100` in both of them? When I run this (preceded by `set.seed(42)`), I see `nrow(disc_sample)` is 79. – r2evans Mar 03 '20 at 00:56
  • @r2evans Yes you are right. Could you please help me to get 100 samples instead of 79. i.e What I can do to let the code keep running until I get 100 samples follow the condition in addition to giving the total number of samples needed to get this number? – Sophie Allan Mar 03 '20 at 01:06
  • When asking questions that include randomness, it's important to include your random see. I cannot reproduce your 79, but I can give something (answer below) that you should be able to reproduce on your system. – r2evans Mar 03 '20 at 01:14
  • *"but it is not working"* is unhelpful. What does not work? Errors? Warnings? Wrong numbers? Do you ever update `j` or `needs`, or do you really want to run your loop forever? – r2evans Mar 03 '20 at 16:09

1 Answers1

1

As to your second question ... you cannot reformulate your code to know precisely how many it will take to get (at least) 100 resulting combinations. You can use a while loop and concatenate results until you have at least 100 such rows, and then truncate those over 100. Because using entropy piecewise (at scale) is "expensive", you might prefer to always over-estimate the rows you need and grab all at once.

(Edited to reduce "complexity" based on homework constraints.)

set.seed(42)
samps <- vector(mode = "list")
goods <- vector(mode = "list")
nr <- 0L
iter <- 0L
sampsize <- 100L
needs <- 100L
while (nr < needs && iter < 50) {
  iter <- iter + 1L
  samps[[iter]] <- data.frame(x = runif(sampsize, -1, 1), y = runif(sampsize, -1, 1))
  rows <- (samps[[iter]]$x^2 + samps[[iter]]$y^2) < 1
  goods[[iter]] <- samps[[iter]][rows, ]
  nr <- nr + sum(rows)
}
iter                # number of times we looped
# [1] 2
out <- head(do.call(rbind, goods), n = 100)
NROW(out)
# [1] 100
head(out) ; tail(out)
#            x          y
# 1  0.8296121  0.2524907
# 3 -0.4277209 -0.5668654
# 4  0.6608953 -0.2221099
# 5  0.2834910  0.8849114
# 6  0.0381919  0.9252160
# 7  0.4731766  0.4797106
#               x          y
# 221 -0.65673577 -0.2124462
# 231  0.08606199 -0.7161822
# 251 -0.37263236  0.1296444
# 271 -0.38589120 -0.2831997
# 28  -0.62909284  0.6840144
# 301 -0.50865171  0.5014720
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • @r2evans thanks a lot for you answer. I tried without success to write something simpler, just a while loop to store all possible samples inside a matrix instead of a list then to call from that matrix just the samples follow the condition. Do you have any suggestions how I can simplify your code without list and without sapply function? – Sophie Allan Mar 03 '20 at 11:54
  • something like the code in the edit of the question above. – Sophie Allan Mar 03 '20 at 12:19
  • Okay, so I reversed the logic (`>` instead of `<`). That notwithstanding, the result is a single frame (not a list) with exactly 100 rows that meet the condition. Isn't that what you need? – r2evans Mar 03 '20 at 13:42
  • thankfully your answer give the required result but the problem that I am limited to few simple functions, i.e that I need to write the code without the use of the functions `list`, `with`, `sapply`. So I tired to simplify your code as in the edit above but now it giving only one row – Sophie Allan Mar 03 '20 at 14:15
  • Yeah, those "constraints" were not in your original question. The use of `sapply` was solely to demonstrate something, it has nothing to do with the functionaliy (note the difference between *doing* and *showing* something). `with` is a convenience to reduce code, you can easily replace `with(...)` with `(samps[[iter]]$x^2 + samps[[iter]]y^2) > 1`. Can't use `list`? What about `vector(mode="list")`? ( Are you sure you can use R? Can you use `c`? :-) (I'm tiring of homework assignments that teach bad practices.) – r2evans Mar 03 '20 at 16:01
  • Yes now it is more simple. Would you please tell me why you included (iter < 50) in the test expression and why it is bounded by 50 ? thank a lot – Sophie Allan Mar 03 '20 at 18:36
  • To prevent an infinite `while` loop. The number 50 is completely arbitrary, it was just something so that a typo did not result in a never-ending loop. – r2evans Mar 03 '20 at 19:05
  • @r2evans hi again. Here the samples are stored in goods which is a vector of lists. How would we plot the results in this case? – Sophie Allan Mar 19 '20 at 12:21
  • The samples are stored in `out`, which had two columns, `x` and `y`. Plotting is a new question, but should really be as simple as `plot(y~x, data=out)`. – r2evans Mar 19 '20 at 13:53