1

My question is related to my previous one Generate random variables from a distribution function using inverse sampling Now I want to generate random variables from a distribution function using inverse sampling but the sampling should be conditioned. For example, if the inverse of my cdf is :

invcdf <- function(y) a2 * log(a1/y - 1) + a3

I used inverse sampling to generate 10 rv as follows :

invcdf(runif(10))

Now, the problem is that I want the values generated greater or less than a value. How should I introduce this condition in random generator?

When I use this to have value greater than 500 :

invcdf(runif(10,500,1e6))

I get this error message : Warning message: In log((a0/y) - 1) : NaNs produced

I already try to repeat the process until having values satsifying my constraints but it is not efficient!

 repeat{
   x=invcdf(runif(1))
     if(x>100){
     break
}
Community
  • 1
  • 1
Lydie
  • 117
  • 1
  • 10

2 Answers2

2

As @spf614 noted, you'd better have checks in your function like

invcdf <- function(y) {
    if (a1 > y) {
        return( a2 * log(a1/y - 1) + a3 )
    }
    NaN
}

Then it works for all arguments

Sampling would be

low <- ...
r <- invcdf(runif(low, a1, 1e6))

UPDATE

checking for NaNs in output

nof_nans <- sum(is.nan(r))
if (nof_nans > 0) {
    ....
Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
  • Doing this, the function will somtimes generates NaN values? – Lydie Feb 04 '16 at 13:35
  • 1
    @Is.Fk yes, it will, and that is RIGHT thing to do. Function shall produce reasonable result for any argument, and that result could be checked. On upper layer sampling now shall call the function with only proper values, but return from function could and SHOULD be checked – Severin Pappadeux Feb 04 '16 at 13:44
1

The reason that you're getting NaNs is that R is trying to take the logarithm of a negative number. Do you want the log term to be log((a1/y)-1) or log(a1/(y-1))? You currently have the function written the first way, and when you get a very high value for y, the term a1/y approaches zero (the speed with which it approaches zero depends on the value of a1). Thus, subtracting 1 gives you a negative number inside the log function. So if the term is meant to be how you have it written (log(a1/y-1)), you simply won't be able to calculate that above certain values of y.

The simple fix is just

invcdf <- function(y){
    a2 * log(a1/(y-1)) + a3
}
spf614
  • 52
  • 6
  • the problem is that the fonction that I should use is exactly log((a1/y)-1) ! Is there a manner in this case to generate inverse sampling using my inverse cdf function and limiting data to be greater than a value ? – Lydie Feb 03 '16 at 20:32
  • What are normal values for the `a1` term? The function is only defined for values of `a1` that are greater than `y`. Otherwise, it either returns negative infinity (when `a1 == y`) or `NaN` (when `a1 < y`). So you would need to specify the parameters of `runif` such that the maximum possible value of `y` is less than `a1`. – spf614 Feb 03 '16 at 22:36
  • in my cas a1 is a parameter estimated to be equal to 1. But I need to obtain a result of the inverse sampling invcdf(runif(10)) greater than a certain value (which is at least equal to 500). Is there another manner to do it without altering the function? – Lydie Feb 04 '16 at 13:32