3

I am building a collection of functions which return probability density functions (pdfs) from the pdfs of two independent random variables.

The most common example of this would be the sum of independent random variables X, Y which is given by the convolution of their pdfs.

Following this post, I defined the following function which takes as arguments a pair of pdfs, and returns their convolution:

dSumXY <- function(dX, dY){

  # Create convolution of distributions.
  dReturn <- function(z){
    integrate( function(x,z){
      dX(x) * dY(z - x) },
      -Inf, Inf, z)$value
  }

  # Vectorize convolution.
  dReturn <- Vectorize(dReturn)

  return(dReturn)  
}

This works as expected in the following example:

# Define pdfs of two (identical) uniform [-1,1] distributions
unifX <- function(x) dunif(x, min = -1, max = 1)
unifY <- function(x) dunif(x, min = -1, max = 1)

# Find convolution of their pdfs.
convXY <- dSumXY(unifX, unifY)

# Plot the convolved pdf.
plot(seq(-3,3,by = 0.1), convXY(seq(-3,3,by = 0.1) ), type = 'l')

# Sample from the distribution
convSample <- runif(10000, min = -1, max = 1) + runif(10000, min = -1, max = 1)

# Plot density of sample.
lines( density(convSample) , col = "red" )

More generally this works for a number of combinations of pairs of uniform distributions, however when I try convoluting a pair of Uniform[1,2] distributions I do not get the true result:

# Define pdfs of two (identical) uniform [1,2] distributions
unifX2 <- function(x) dunif(x, min = 1, max = 2)
unifY2 <- function(x) dunif(x, min = 1, max = 2)

# Find convolution of their pdfs.
convXY2 <- dSumXY(unifX2, unifY2)

# Plot the convolved pdf.
plot(seq(1,5,by = 0.1), convXY2(seq(1,5,by = 0.1) ), type = 'l')

# Sample from the distribution
convSample2 <- runif(10000, min = 1, max = 2) + runif(10000, min = 1, max = 2)

# Plot density of sample.
lines( density(convSample2) , col = "red" )

In particular, (with a bit of probabilistic knowledge!) it is clear that the pdf of the sum of two Uniform[1,2] variables should be non-zero at 3.75; in particular it should be equal to 1/4. However

# This should be equal to 1/4, but returns 0.
convXY2(3.75)

I have tried this on two separate machines and replicated the same issue, so am interested to see where the issue is arising from.

Thanks in advance.

Community
  • 1
  • 1
owen88
  • 454
  • 3
  • 12

2 Answers2

2

The problem is coming from the Integrate function which is struggling with the fact that the function has compact support, however, it is being asked to integrate over an infinite range. If you change the range of integration to range where the function has support (i.e. [1,2]) then it works no problem.

Rob
  • 169
  • 7
  • So that is something I considered, but then I can't see why it can handle some examples (eg. the first case of Uniform[-1,1]), but not others. Why would it be that it can handle some functions with compact support, but not others? – owen88 Apr 28 '17 at 20:16
  • If you repeat the calculation using Uniform(x,x+1), then it works when -1 – Rob Apr 28 '17 at 20:57
  • Hi Rob, So I agree that for many choices of x (in your example) it works: and i think your observation is correct: -1 < x < 0 is fine. But: do we actually have any evidence, rather than conjecture, as to why this is the only definition of uniform variables that works? It seems as though either: a) there should be a non-hypothesised explanation as to what is happening, or b) this is a bug that should be reported. – owen88 Apr 28 '17 at 21:44
  • I don't think this is a bug, in general I don't think it is possible to numerically integrate functions with compact support on (-inf,inf) e.g. consider x=10^1000 in my example above. Look at the documentation for [QUADPACK](http://www.netlib.org/quadpack/doc) the underlying package on which Integrate is based, which suggests breaking the integral at any discontinuities or derivative singularities. Sometimes it may work, but I think this is just luck. – Rob Apr 29 '17 at 23:12
  • I did read the help for Integrate, and it does in fact highlight that integrating a function which "is approximately constant (in particular, zero) over nearly all its range it is possible that the result and error estimate may be seriously wrong". What I find surprising is that `nearly all' includes the case of doing the convolution of two Unif[1,2] variables over the range [0,6]. I personally wouldn't call 5/6 of a domain `nearly all' of it... But I guess that's just me! Thanks for highlighting this stability issue... Now the question is how I can get around it! – owen88 Apr 30 '17 at 06:36
0

dunif deals in whole numbers...

dunif(.9, 1, 2)
[1] 0

dunif(1, 1, 2)
[1] 1

dunif(1.5, 1, 2)
[1] 1

dunif(2, 1, 2)
[1] 1

dunif(2.1, 1, 2)
[1] 0

If you only use whole numbers for your function and then interpolate, you get the expected answer.

convXY2(2)
[1] 0

convXY2(3)
[1] 1

convXY2(4)
[1] 0

z <- 3.75
remainder <- z %% 1
convXY2(floor(z))*(1-remainder) + convXY2(ceiling(z))*(remainder) 
[1] 0.25

Your function works correctly for other continuous distributions so perhaps you just need to account for this specific situation in a wrapper function. That or use something other than dunif which effectively divides your probability based on the precision you give it.

dunif2 <- function(x, min, max) { if (x >= min & x <= max) { return(10^-nchar(strsplit(as.character(x), "\\.")[[1]][[2]])) } else { return(0) } }

dunif2(1.45, 1, 2)
[1] 0.01

dunif2(1.4, 1, 2)
[1] 0.1
Gladwell
  • 328
  • 3
  • 10
  • Wouldn't this mean that convXY2(3.5) would return 0? But it does actually return 0.5, which is the value to be expected. – owen88 Apr 28 '17 at 20:15
  • This is true. All I can say in support is it's a simpler fix that doesn't require changes to your dSumXY function. – Gladwell Apr 30 '17 at 11:16