9

I try to create a jointplot with seaborn by using the following code:

import seaborn as sns 
import pandas as pd
import numpy as np
import matplotlib.pylab as plt

testdata = pd.DataFrame(np.array([[100, 1, 3], [5, 2, 6], [25, 3, -4]]), index=['A', 'B', 'C'], columns=['counts', 'X', 'Y'])
counts = testdata['counts'].values
sns.jointplot('X', 'Y', data=testdata, kind='kde', joint_kws={'weights':counts})
plt.savefig('test.png')

Now the joint_kws doesn't raise an error, but the weights sure are not taken into account as can be seen in the plot:

I also tried to do it with JointGrid, passing the weights to the marginal distributions:

g = sns.JointGrid('X', 'Y', data=testdata)
x = testdata['X'].values
y = testdata['Y'].values
g.ax_marg_x.hist(x, bins=np.arange(-10,10), weights=counts)
g.ax_marg_y.hist(y, bins=np.arange(-10,10), weights=counts, orientation='horizontal')
g.plot_marginals(sns.distplot)
g.plot_join(sns.kdeplot, joint_kws={'weights':counts})
plt.savefig('test.png')

But this works only for the marginal distributions, while the joint plot still is not weighted:

Has anyone an idea how to do this?

double-beep
  • 5,031
  • 17
  • 33
  • 41
madcap
  • 163
  • 2
  • 7
  • Okay, I might be out of my element here, but exactly what would you like to see ? –  May 20 '15 at 08:46
  • Sorry for being unclear. I want to weight the data points. The weights are 100, 5 and 25 for A, B and C, respectively, so the data point 'A' should be much more important as 'B' and contribute much more to the distribution. The marginal distributions in the lower plot show this weighted distribution compared to the marginal distributions in the upper plot. – madcap May 20 '15 at 08:53
  • Here is a way to do it without seaborn: https://gist.github.com/tillahoffmann/f844bce2ec264c1c8cb5#file-weighted_kde-ipynb – Dan Dec 20 '17 at 13:22

3 Answers3

3

Unfortunately this seems to be impossible.

The was a feature request filed in December 2015, but is was closed as will-not-fix.

It is also discussed in this StackOverflow question: weights option for seaborn distplot?

Community
  • 1
  • 1
Tobber
  • 7,211
  • 8
  • 33
  • 56
2

I know this dates from a while back, but I have been able to use the weights in jointplots using:

p = sns.jointplot(data=v, x="x", y="y",  kind="hist", weights=v.weights, bins=50)

v being a dataframe with columns [x,y,weights]

jkobject
  • 71
  • 5
-1

You're really close.

The thing to keep in is that join plot does the following (heavily paraphrase):

def jointplot(x, y, data=None, ..., joint_kws):
    g = sns.JointGrid(...)
    g.plot_joint(..., **joint_kws)

So when you call g.plot_joint yourself, just feed it normal kwargs:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

testdata = pd.DataFrame(
    np.array([[100, 1, 3], [5, 2, 6], [25, 3, -4]]), 
    index=['A', 'B', 'C'], 
    columns=['counts', 'X', 'Y']
)
counts = testdata['counts'].values

g = sns.JointGrid('X', 'Y', data=testdata)
g.plot_marginals(sns.distplot)
g.plot_joint(sns.kdeplot, weights=counts)

enter image description here

Now I'm not sure if that looks right, but it didn't barf, so that's worth something.

Paul H
  • 65,268
  • 20
  • 159
  • 136
  • 5
    This sounds reasonable, but the plot is still unweighted. Take the point A (x=1, y=3). Its count is 100. The sum of all counts is 130 (100 + 5 + 25). So A should be weighted 100/130, or 0.77, and the whole distribution should definitely have the biggest peak there. – madcap Jun 12 '15 at 12:47