2

The probability distribution of the sum of two random variables, x and y, is given by the convolution of the individual distributions. I'm having some trouble doing this numerically. In the following example, x and y are uniformly distributed, with their respective distributions approximated as histograms. My reasoning says that the histograms should be convoluted to give the distribution of, x+y.

from numpy.random import uniform
from numpy import ceil,convolve,histogram,sqrt
from pylab import hist,plot,show

n = 10**2

x,y = uniform(-0.5,0.5,n),uniform(-0.5,0.5,n)

bins = ceil(sqrt(n))

pdf_x = histogram(x,bins=bins,normed=True)
pdf_y = histogram(y,bins=bins,normed=True)

s = convolve(pdf_x[0],pdf_y[0])

plot(s)
show()

which gives the following,

enter image description here

In other words, a triangular distribution, as expected. However, I have no idea how to find the x-values. I would appreciate it if someone could correct me here.

lafras
  • 8,712
  • 4
  • 29
  • 28
  • How would the `x`-values be correct if you didn't even specify them in the plot? Also, strictly speaking, `histogram` wont give you a `pdf` so simply way. But please first consider the number of your `bins` respect to `n`. Thanks – eat Jun 29 '11 at 19:22

1 Answers1

5

In order to still move on (towards more murky details), I further adapted your code like this:

from numpy.random import uniform
from numpy import convolve, cumsum, histogram, linspace

s, e, n= -0.5, 0.5, 1e3
x, y, bins= uniform(s, e, n), uniform(s, e, n), linspace(s, e, n** .75)
pdf_x= histogram(x, normed= True, bins= bins)[0]
pdf_y= histogram(y, normed= True, bins= bins)[0]
c= convolve(pdf_x, pdf_y); c= c/ c.sum()
bins= linspace(2* s, 2* e, len(c))
# a simulation
xpy= uniform(s, e, 10* n)+ uniform(s, e, 10* n)
c2= histogram(xpy, normed= True, bins= bins)[0]; c2= c2/ c2.sum()

from pylab import grid, plot, show, subplot
subplot(211), plot(bins, c)
plot(linspace(xpy.min(), xpy.max(), len(c2)), c2, 'r'), grid(True)
subplot(212), plot(bins, cumsum(c)), grid(True), show()

Thus, giving plots something like this: enter image description here Where the upper part represents the PDF (blue line), which indeed looks quite triangular and the simulation (red dots), which reflects the triangular shape. Lower part represents the CDF, which also looks to follow nicely the expected S-curve.

eat
  • 7,440
  • 1
  • 19
  • 27
  • Thanks for the answer. Going according to the Wikipedia article on [convolution](http://en.wikipedia.org/wiki/Convolution), I would expect the new x range to run from -1 to 1. That's the part I'm struggling with. – lafras Jun 29 '11 at 20:23
  • This (pdf_x= histogram(x)**[0]**) ignores the bin "locations" for the histograms, converting the histogram into a pdf, before convolution. So for real data, make sure your two data set histograms are "aligned" to have exactly the same bins (using that `bins=bins` argument) before doing this. You'd fill any nonoverlapping gaps or margins with zero counts (empty bins). Otherwise the resulting PDF (and CDF) will be wrong even though they look like S curves and triangles. – hobs Apr 10 '13 at 22:59