0

Out of the box seaborn does a very good job to plot a 2D KDE or jointplot. However it is not returning anything like a function that I can evaluate to numerically read the values of the estimated density.

How can I evaluate numerically the density that sns.kdeplot or jointplot has put in the plot?

Just for completeness. I see something interesting in the scipy docs, stats.gaussian_kde but I am getting very clunky density plots,

enter image description here

which for some reason because of missing extent are really off compared to the scatter plot. So I would like to stay away from the scipy kde, at least until I figure how to make it work why pyplot is so much more "not smart" as seaborn is.

Anyhow, the evaluate method of the scipy.stats.gaussian_kde does its job.

Rho Phi
  • 1,182
  • 1
  • 12
  • 21
  • Chances are pretty good that sns.kdeplot and jointplot are punting to some other function to actually construct the density estimate. You could take a look at the source code to see what's being called, and then arrange to call that same function. – Robert Dodier Dec 28 '20 at 21:58
  • 1
    Seaborn uses [`scipy.stats.gaussian_kde`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html). Could you add some toy data and the "clunky density plot" you tried? Seaborn doesn't have the habit to return functions, it creates visualizations. Also see the comments of [Can I retrieve the bandwidth used in a seaborn kdeplot?](https://stackoverflow.com/questions/65461136/can-i-retrieve-the-bandwidth-used-in-a-seaborn-kdeplot) for some pointers to the source code. – JohanC Dec 28 '20 at 22:15

1 Answers1

0

I also faced this issue in jointplot() method. I opened a file distribution.py on this path anaconda3/lib/python3.7/site-packages/seaborn/. Then I added these lines in _bivariate_kdeplot() function:

print("xx=",xx[50])
print("yy=",yy[:,50])
print("z=",z[50])

This prints out 100 values of x,y and z arrays of 50 index. Where "z" is the density and "xx" and "yy" are the values adjusted according to the bandwidth, cut and clip, in a meshgrid form distributed according to grid size, that were given by the user. This gave me some idea about the actual values of the 2D kde plot. If you print out entire array of each variable then you will get 100 x 100 values of each.