11

Given a probability distribution with unknown functional form (example below), I like to plot "percentile-based" contour lines, i.e.,those that correspond to regions with an integral of 10%, 20%, ..., 90% etc.

## example of an "arbitrary" probability distribution ##
from matplotlib.mlab import bivariate_normal
import matplotlib.pyplot as plt
import numpy as np

X, Y = np.mgrid[-3:3:100j, -3:3:100j]
z1 = bivariate_normal(X, Y, .5, .5, 0., 0.)
z2 = bivariate_normal(X, Y, .4, .4, .5, .5)
z3 = bivariate_normal(X, Y, .6, .2, -1.5, 0.)
z = z1+z2+z3
plt.imshow(np.reshape(z.T, (100,-1)), origin='lower', extent=[-3,3,-3,3])
plt.show()

enter image description here I've looked into multiple approaches, from using the default contour function in matplotlib, methods involving stats.gaussian_kde in scipy, and even perhaps generating random point samples from the distribution and estimating a kernel afterwards. None of them appears to provide the solution.

neither-nor
  • 1,245
  • 2
  • 17
  • 30
  • Your question is ill-posed. There are infinitely many ways you could divide your example picture so that, for example, each side of the division has an integral of 50%. What division do you want? It sounds like you want contour lines - but only those that correspond to regions with an integral of 10%, 20%, ..., 90% for example. Is that correct? – Timothy Shields Jun 20 '16 at 17:30
  • @TimothyShields Thanks for the clarification. What you have better stated is indeed what I want. – neither-nor Jun 20 '16 at 21:21

2 Answers2

17

Look at the integral of p(x) inside the contour p(x) ≥ t and solve for the desired value of t:

import matplotlib
from matplotlib.mlab import bivariate_normal
import matplotlib.pyplot as plt
import numpy as np

X, Y = np.mgrid[-3:3:100j, -3:3:100j]
z1 = bivariate_normal(X, Y, .5, .5, 0., 0.)
z2 = bivariate_normal(X, Y, .4, .4, .5, .5)
z3 = bivariate_normal(X, Y, .6, .2, -1.5, 0.)
z = z1 + z2 + z3
z = z / z.sum()

n = 1000
t = np.linspace(0, z.max(), n)
integral = ((z >= t[:, None, None]) * z).sum(axis=(1,2))

from scipy import interpolate
f = interpolate.interp1d(integral, t)
t_contours = f(np.array([0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]))
plt.imshow(z.T, origin='lower', extent=[-3,3,-3,3], cmap="gray")
plt.contour(z.T, t_contours, extent=[-3,3,-3,3])
plt.show()

enter image description here

Timothy Shields
  • 75,459
  • 18
  • 120
  • 173
  • 2
    Thank you for the perfect solution! However, I'm completely lost trying to understand the line ``integral = ((z >= t[:, None, None]) * z).sum(axis=(1,2))``. Also, is there a way to label the contour lines with 0.9, 0.8, 0.7, and so on? – neither-nor Jun 22 '16 at 18:29
  • 4
    @neither-nor `z` is a 2D array representing the distribution p(x). `t` is a 1D array of different thresholds between `0` and `z.max()`. `mask = (z >= t[:, None, None])` is a 3D array of shape `t.shape + z.shape` where each `mask[i]` is a 2D array of 0/1 values where 1's are inside the contour p(x) >= t[i]. `integral = (mask * z).sum(axis=(1,2))` is a 1D array of the integrals over those regions, where `integral[i]` is the integral of p(x) over the region p(x) >= t[i]. – Timothy Shields Jun 22 '16 at 19:56
-5

You can do something like this:

from matplotlib.mlab import bivariate_normal
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import numpy as np

X, Y = np.mgrid[-3:3:100j, -3:3:100j]

sigma = 0.5

z = bivariate_normal(X,Y,.5, .5, 0., 0.)
z1 = bivariate_normal(0, 1 * sigma, sigma, sigma, 0.0, 0.0)
z2 = bivariate_normal(0, 2 * sigma, sigma, sigma, 0.0, 0.0)
z3 = bivariate_normal(0, 3 * sigma, sigma, sigma, 0.0, 0.0)

plt.imshow(z, interpolation='bilinear', origin='lower', extent=[-3,3,-3,3])
contour = plt.contour(z,[z1,z2,z3],origin='lower',extent=[-3,3,-3,3],colors='yellow')
plt.show()

Which gives:

enter image description here

Srivatsan
  • 9,225
  • 13
  • 58
  • 83