4

I am working with the Titanic passenger dataset (from Kaggle) as part of a Udacity course. I am using a Seaborn FacetGrid to look at passenger age distribution profiles by Travel class and Gender - with hue as 'Survived' (1/0).

The plot is working well, and I want to add vertical mean lines to each subplot - but in different colours (and with different annotation) for each of the two 'hues' in each subplot (1/0). The ' vertical_mean_line ' function in code below works great on plots without the multiple 'hue' data - but I can't find a way to plot different lines for each hue

Any ideas if it is possible to do this within Seaborn?

Current Seaborn FacetGrid plot output:

Seaborn FacetGrid plot

Code:

sns.set()
sns.set_context('talk')
sns.set_style('darkgrid')
grid = sns.FacetGrid(titanic_data.loc[titanic_data['is_child_def'] == False], col='Sex', row = 'Pclass', hue='Survived' ,size=3.2, aspect=2)
grid.map(sns.kdeplot, 'Age', shade=True)
grid.set(xlim=(14, titanic_data['Age'].max()), ylim=(0,0.06))
grid.add_legend()


# Add vertical lines for mean age on each plot
def vertical_mean_line_survived(x, **kwargs):
    plt.axvline(x.mean(), linestyle = '--', color = 'g')
    #plt.text(x.mean()+1, 0.052, 'mean = '+str('%.2f'%x.mean()), size=12)
    #plt.text(x.mean()+1, 0.0455, 'std = '+str('%.2f'%x.std()), size=12)

grid.map(vertical_mean_line_survived, 'Age') 

# Add text to each plot for relevant popultion size
# NOTE - don't need to filter on ['Age'].isnull() for children, as 'is_child'=True only possible for children with 'Age' data
for row in range(grid.axes.shape[0]):
    grid.axes[row, 0].text(60.2, 0.052, 'Survived n = '+str(titanic_data.loc[titanic_data['Pclass']==row+1].loc[titanic_data['is_child_def']==False].loc[titanic_data['Age'].isnull()==False].loc[titanic_data['Survived']==1]['is_male'].sum()), size = 12)
    grid.axes[row, 1].text(60.2, 0.052, 'Survived n = '+str(titanic_data.loc[titanic_data['Pclass']==row+1].loc[titanic_data['is_child_def']==False].loc[titanic_data['Age'].isnull()==False].loc[titanic_data['Survived']==1]['is_female'].sum()), size = 12)
    grid.axes[row, 0].text(60.2, 0.047, 'Perished n = '+str(titanic_data.loc[titanic_data['Pclass']==row+1].loc[titanic_data['is_child_def']==False].loc[titanic_data['Age'].isnull()==False].loc[titanic_data['Survived']==0]['is_male'].sum()), size = 12)
    grid.axes[row, 1].text(60.2, 0.047, 'Perished n = '+str(titanic_data.loc[titanic_data['Pclass']==row+1].loc[titanic_data['is_child_def']==False].loc[titanic_data['Age'].isnull()==False].loc[titanic_data['Survived']==0]['is_female'].sum()), size = 12)



grid.set_ylabels('Frequency density', size=12)

# Squash down a little and add title to facetgrid    
plt.subplots_adjust(top=0.9)
grid.fig.suptitle('Age distribution of adults by Pclass and Sex for Survived vs. Perished')
ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
chrisrb10
  • 189
  • 1
  • 3
  • 12
  • It took me a while to reproduce the issue. Could you please next time asking a question, produce a [mcve] which can be directly copied and pasted. You don't actually need this kind of complex dataframe to ask a question about hue in FacetGrid mapping, right? – ImportanceOfBeingErnest Jul 07 '17 at 01:41

1 Answers1

10

The kwargs contain the label and the color of the respective hue. Therefore, using

def vertical_mean_line_survived(x, **kwargs):
    ls = {"0":"-","1":"--"}
    plt.axvline(x.mean(), linestyle =ls[kwargs.get("label","0")], 
                color = kwargs.get("color", "g"))
    txkw = dict(size=12, color = kwargs.get("color", "g"), rotation=90)
    tx = "mean: {:.2f}, std: {:.2f}".format(x.mean(),x.std())
    plt.text(x.mean()+1, 0.052, tx, **txkw)

we would get

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • Many thanks - that's great. And apologies for the overly long question code - I'm a relative stackoverflow novice. On the real data, the mean lines for populations 0,1 are pretty close, so the alignment of xmean()+1 can overlay them. Which raises 2 follow ups: 1) How can I vary the text location parameters by hue parameter? 2) Is there a function to return the max y value of the kde curve (so I set place y co-ords relative to that? Many thanks. – chrisrb10 Jul 07 '17 at 10:07
  • 1. you get the hue parameter as `kwargs.get("label")`, so you can do `if kwargs.get("label") =="0": ... else: ... ` and set different locations for both cases. 2. The problem is that you would need to get the y value of the kde curve inside the labeling function. I guess you can recalculate the kde curve inside it, e.g. using [scipy.stats.gaussian_kde.](https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.stats.gaussian_kde.html) and then take its maximum. It does seem a bit overkill though. – ImportanceOfBeingErnest Jul 07 '17 at 10:20
  • Thanks. `kwargs.get('label')` works perfectly. Agree that recalculating kde curve for a label position is overkill - too ambitious for now. – chrisrb10 Jul 07 '17 at 11:10