Let's use the famous Titanic dataset found here:
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.xls
And read it in as a dataframe: df
I'm interested in visualizing survival rate per passenger segment, with passenger segments defined as a hexbin bucket of fare
x age
.
Generating the hexbin of those two features is fairly straightforward:
sns.set(font_scale=1.5)
sns.set_style("white")
fig = plt.figure(figsize=(8,8))
fig = sns.jointplot("age", "fare", data=df, kind="hex",
joint_kws={'gridsize':22},
xlim=(-20, 90), ylim=(-20,300), mincnt=0,
stat_func=None, marginal_kws={"bins":10, "color":"k", "rug":True}, color="black"
)
But rather than density (which is shown in the marginal plot anyway), I'd like the color of the chart to represent survival rate (survived
is a binary 1 & 0 dataframe feature) for all passengers counted within each bin.
Answers here are somewhat helpful, but scatter plots are problematic for dense datasets, thus my use of a hexbin.
Any help how I might make this work?