When using kdeplot on a seaborn PairGrid, the kernel density contours don't match up with the points

Question

my problem can be summed up by the plots at the bottom of this post.

They show progressive zooming in of the buggy pairgrid, with the key plots being on the left column. Essentially, the points in my pairgrid are annoyingly very scattered, however as can be seen on the 3rd plot the bulk of them are still fairly localised in what I expected to be a Gaussian distribution.

Unfortunately, the KDE contour plot seems to completely miss the main bulk of the points, and orders itself around a few outliers.

Here's the code I'm using the generate the plots from a pandas DataFrame:

import seaborn as sns
from matplotlib import pyplot as plt

g = sns.PairGrid(HP, diag_sharey=False)
g.map_lower(sns.kdeplot, n_levels=5)
g.map_lower(plt.scatter, marker='^', alpha=0.7, color='y')
g.map_upper(plt.scatter, marker='+')
g.map_diag(sns.kdeplot)

I'm trying to figure out why this is happening. Does the kdeplot select only a subsample of the points or what?

Can't say for sure without access to your data but your distribution is extremely skewed and a gaussian KDE assumes that a gaussian is a reasonably good fit to the distribution. — mwaskom, Nov 04 '16 at 14:19
Well my data is indeed pretty trashy, I'm trying to fix that separately :D. However, as bad as it may be, the KDE isn't doing what I expected. From looking at the above, it doesn't seem like the main bulk of the data is contributing any Gaussian kernels at all. But I guess it could also be a problem somewhere with the contour plotter if the pdf has too many features and small peaks. — Marses, Nov 04 '16 at 15:32
It's not a matter of good or bad data, it's a matter of data that matches statistical assumptions. e.g. the x variable in the middle plot has extremely high kurtosis. Fitting that distribution with a gaussian kernel that is only matched based on the variance will look like the contours are "missing data". I'd encourage you to focus on a single variable or bivariate relationship and play around with the bandwidith of the KDE (or see what fitting a gaussian to the distribution itself looks like) to get a better intuition for what is happening. — mwaskom, Nov 04 '16 at 16:15

When using kdeplot on a seaborn PairGrid, the kernel density contours don't match up with the points

0 Answers0