Python - integrate 2D Kernel Density Estimation within contour lines

Question

I would like to draw a contour plot of a Kernel Density Estimation, where the KDE is integrated within each of the contour plot filled areas.

As an example, imagine I calculate the KDE of 2D data:

data = np.random.multivariate_normal((0, 0), [[1, 1], [2, 0.7]], 100)
x = data[:, 0]
y = data[:, 1]
xmin, xmax = min(x), max(x)
ymin, ymax = min(y), max(y)
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)

I know how to draw the contourplot of the KDE.

fig = plt.figure()
ax = fig.gca()
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
cfset = ax.contourf(xx, yy, f, cmap='Blues')
cset = ax.contour(xx, yy, f, colors='k')
plt.show()

However, this contourplot shows what is the probability density within each of the filled areas. Instead, I would like the plot to indicate the total probability to fall within each of the filled areas.

Paul Panzer · Accepted Answer · 2017-01-13T22:20:16.920

1

Please note that the following is only correct as long as your contours are 'monotonic', i.e. inside a contour line you find only pixel values above the correspoonding contour level. Also note that if your density is multipeaked corresponding areas in separate peaks are lumped together.

If this is true/acceptable your problem can be solved by ordering the pixels by value.

I don't know by which heuristic your plot program chooses its contour levels, but assuming you have them stored (in ascending order, say) in a variable called 'levels' you could try something like

ff = f.ravel()
order = np.argsort(ff)
fsorted = ff[order]
F = np.cumsum(fsorted)
# depending on how your density is normalised next line may be superfluous
# also note that this is only correct for equal bins
# and, finally, to be unimpeachably rigorous, this disregards the probability
# mass outside the field of view, so it calculates probability condtional
# on being in the field of view
F /= F[-1]
boundaries = fsorted.searchsorted(levels)
new_levels = F[boundaries]

Now, for you to be able to use this your plot program must either allow you to freely choose the contour labels or at least to choose the levels at which to put the contours. In the latter case, assuming there's a kwarg 'levels'

# make a copy to avoid problems with in-place shuffling
# i.e. overwriting positions whose original values are still to be read out
F[order] = F.copy()
F.shape = f.shape
cset = ax.contour(xx, yy, F, levels=new_levels, colors='k')

I've copied the following from one of your comments to make it more visible

Finally, if one wants to really have the probability within each filled area, this is a workaround that works: cb = fig.colorbar(cfset, ax = ax) values = cb.values.copy() values[1:] -= values[:-1].copy() cb.set_ticklabels(values) – Laura

edited Jan 13 '17 at 22:20

answered Jan 11 '17 at 03:19

Paul Panzer

51,835
3
54
99

Thank you. Although the answer does not fully work, it is getting me closer to the solution (at least in this specific case where the contours are monotonic). Could you please clarify what you do when you assign F[order] = 1 - F ? Thank you – Laura Jan 11 '17 at 16:01
I have found the problem. This solution works if one plots ax.contourf(xx,yy,(1-F),levels = new_levels). – Laura Jan 11 '17 at 17:00
Great! So you could probably also replace F[order] = 1 - F by F[order] = F and leave the other line untouched. Do you still need advice on what the F[order] = F line actually does? – Paul Panzer Jan 11 '17 at 17:02
Yes, that'd be great. It works but I do not get how. Instead, setting F[order] = F and leaving the other lines unchanged does not work for some reasons. Also, I guess it is more intuitive to set boundaries = F.searchsorted(levels), where levels is a list of probabilities, instead of probability densities. – Laura Jan 11 '17 at 17:14
Hm, strange. I just noticed you replaced contour with contourf; maybe that's why? What's the difference between contour and contourf, anyway? Re the F[order] = F: I'll try my best. Remember that we have two copies of your original density estimate: ff and fsorted. These are exactly the same data only arranged differently. ff is in the 'correct', original order; fsorted is, well, sorted to facilitate subsequent steps. Now, F is calculated from fsorted and therefore first arranged like fsorted, but, of course, the correct order is still that of ff. We therefore need to 'unsort' F. – Paul Panzer Jan 11 '17 at 17:57
The variable named 'order' can be thought of as the permutation one needs to apply to ff to gain fsorted (fsorted <- ff[order]). So, technically, we need to apply the inverse permutation to F. Conveniently, there is no need to calculate this inverse explicitly, because indexing the l.h.s. instead of the r.h.s. (as in F[order] <- F) happens to do the trick. I'm not sure how intuitive this behaviour (of numpy indexing) is to someone not used to it, but one way of making one's brain accept it is to think in terms of the underlying for loop: for i in range(N): F[order[i]] = F[i] – Paul Panzer Jan 11 '17 at 17:59
Ah, I think I know what's wrong. F[order]=F most likely overwrites some elements of F before they are read out. By contrast in F[order] = 1 - F, first a temporary holding the entire vector 1 - F is created and only then the values are written to F[order]. Could you try whether F[order] = F.copy() works? – Paul Panzer Jan 11 '17 at 18:08
contour draws contour and contourf fills areas between contours based on a given colormap. F[order] = F.copy() works. Could you please correct the answer in your post? I was perturbed by what happened doing F[order] = F which is why I did not fully understood the answer at first. Finally, if one wants to really have the probability within each filled area, this is a workaround that works: cb = fig.colorbar(cfset, ax = ax) values = cb.values.copy() values[1:] -= values[:-1].copy() cb.set_ticklabels(values) – Laura Jan 13 '17 at 12:45
Done. Please note that I also changed the assigment to new_levels in the same manner. Hope that's the correct thing to do. If not give me a shout and I'll (un-) fix it. – Paul Panzer Jan 13 '17 at 22:27

Python - integrate 2D Kernel Density Estimation within contour lines

1 Answers1