Python plotting percentile contour lines of a probability distribution

Question

Given a probability distribution with unknown functional form (example below), I like to plot "percentile-based" contour lines, i.e.,those that correspond to regions with an integral of 10%, 20%, ..., 90% etc.

## example of an "arbitrary" probability distribution ##
from matplotlib.mlab import bivariate_normal
import matplotlib.pyplot as plt
import numpy as np

X, Y = np.mgrid[-3:3:100j, -3:3:100j]
z1 = bivariate_normal(X, Y, .5, .5, 0., 0.)
z2 = bivariate_normal(X, Y, .4, .4, .5, .5)
z3 = bivariate_normal(X, Y, .6, .2, -1.5, 0.)
z = z1+z2+z3
plt.imshow(np.reshape(z.T, (100,-1)), origin='lower', extent=[-3,3,-3,3])
plt.show()

I've looked into multiple approaches, from using the default contour function in matplotlib, methods involving stats.gaussian_kde in scipy, and even perhaps generating random point samples from the distribution and estimating a kernel afterwards. None of them appears to provide the solution.

Your question is ill-posed. There are infinitely many ways you could divide your example picture so that, for example, each side of the division has an integral of 50%. What division do you want? It sounds like you want contour lines - but only those that correspond to regions with an integral of 10%, 20%, ..., 90% for example. Is that correct? — Timothy Shields, Jun 20 '16 at 17:30
@TimothyShields Thanks for the clarification. What you have better stated is indeed what I want. — neither-nor, Jun 20 '16 at 21:21

score 17 · Accepted Answer · answered Jun 20 '16 at 22:04

17

Look at the integral of p(x) inside the contour p(x) ≥ t and solve for the desired value of t:

import matplotlib
from matplotlib.mlab import bivariate_normal
import matplotlib.pyplot as plt
import numpy as np

X, Y = np.mgrid[-3:3:100j, -3:3:100j]
z1 = bivariate_normal(X, Y, .5, .5, 0., 0.)
z2 = bivariate_normal(X, Y, .4, .4, .5, .5)
z3 = bivariate_normal(X, Y, .6, .2, -1.5, 0.)
z = z1 + z2 + z3
z = z / z.sum()

n = 1000
t = np.linspace(0, z.max(), n)
integral = ((z >= t[:, None, None]) * z).sum(axis=(1,2))

from scipy import interpolate
f = interpolate.interp1d(integral, t)
t_contours = f(np.array([0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]))
plt.imshow(z.T, origin='lower', extent=[-3,3,-3,3], cmap="gray")
plt.contour(z.T, t_contours, extent=[-3,3,-3,3])
plt.show()

answered Jun 20 '16 at 22:04

Timothy Shields

75,459
18
120
173

2

Thank you for the perfect solution! However, I'm completely lost trying to understand the line ``integral = ((z >= t[:, None, None]) * z).sum(axis=(1,2))``. Also, is there a way to label the contour lines with 0.9, 0.8, 0.7, and so on? – neither-nor Jun 22 '16 at 18:29
4

@neither-nor `z` is a 2D array representing the distribution p(x). `t` is a 1D array of different thresholds between `0` and `z.max()`. `mask = (z >= t[:, None, None])` is a 3D array of shape `t.shape + z.shape` where each `mask[i]` is a 2D array of 0/1 values where 1's are inside the contour p(x) >= t[i]. `integral = (mask * z).sum(axis=(1,2))` is a 1D array of the integrals over those regions, where `integral[i]` is the integral of p(x) over the region p(x) >= t[i]. – Timothy Shields Jun 22 '16 at 19:56

score -5 · Answer 2 · answered Jun 18 '16 at 07:00

You can do something like this:

from matplotlib.mlab import bivariate_normal
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import numpy as np

X, Y = np.mgrid[-3:3:100j, -3:3:100j]

sigma = 0.5

z = bivariate_normal(X,Y,.5, .5, 0., 0.)
z1 = bivariate_normal(0, 1 * sigma, sigma, sigma, 0.0, 0.0)
z2 = bivariate_normal(0, 2 * sigma, sigma, sigma, 0.0, 0.0)
z3 = bivariate_normal(0, 3 * sigma, sigma, sigma, 0.0, 0.0)

plt.imshow(z, interpolation='bilinear', origin='lower', extent=[-3,3,-3,3])
contour = plt.contour(z,[z1,z2,z3],origin='lower',extent=[-3,3,-3,3],colors='yellow')
plt.show()

Which gives:

These contour lines correspond to the height of the distribution, not the amount of volume within them. — neither-nor, Jun 18 '16 at 18:33

Python plotting percentile contour lines of a probability distribution

2 Answers2