3

Let's use the famous Titanic dataset found here:

http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.xls

And read it in as a dataframe: df

I'm interested in visualizing survival rate per passenger segment, with passenger segments defined as a hexbin bucket of fare x age.

Generating the hexbin of those two features is fairly straightforward:

sns.set(font_scale=1.5)
sns.set_style("white")

fig = plt.figure(figsize=(8,8))

fig = sns.jointplot("age", "fare", data=df, kind="hex",
             joint_kws={'gridsize':22},
             xlim=(-20, 90), ylim=(-20,300), mincnt=0,
             stat_func=None, marginal_kws={"bins":10, "color":"k", "rug":True}, color="black" 
          )

hexbin

But rather than density (which is shown in the marginal plot anyway), I'd like the color of the chart to represent survival rate (survived is a binary 1 & 0 dataframe feature) for all passengers counted within each bin.

Answers here are somewhat helpful, but scatter plots are problematic for dense datasets, thus my use of a hexbin.

Any help how I might make this work?

Community
  • 1
  • 1
samthebrand
  • 3,020
  • 7
  • 41
  • 56
  • 1
    Hexbin can take anything an the `C` input (http://matplotlib.org/api/axes_api.html?highlight=hexbin#matplotlib.axes.Axes.hexbin) but I am not sure how to convince seaborn to take a different value for it. – tacaswell Jul 09 '16 at 00:19
  • 1
    A non-programming issue is that this is a standard plot format, and the reason these are called "marginal distributions". If you don't want the marginal plots to be connected to the interior plot, you should probably plot them on different axes. Otherwise it will be hard to disentangle the usual meaning from your alternate meaning. – tom10 Jul 09 '16 at 15:27

0 Answers0