How to plot stacked 100% bar plot with seaborn for categorical data

Question

I have a dataset that looks like this (assume this has 4 categories in Clicked, the head(10) only showed 2 categories):

    Rank Clicked
0   2.0 Cat4
1   2.0 Cat4
2   2.0 Cat4
3   1.0 Cat1
4   1.0 Cat4
5   2.0 Cat4
6   2.0 Cat4
7   3.0 Cat4
8   5.0 Cat4
9   5.0 Cat4

This is a code that returns this plot:

eee = (df.groupby(['Rank','Clicked'])['Clicked'].count()/df.groupby(['Rank'])['Clicked'].count())
eee.unstack().plot.bar(stacked=True)
plt.legend(['Cat1','Cat2','Cat3','Cat4'])
plt.xlabel('Rank')

Is there a way to achieve this with seaborn (or matplotlib) instead of the pandas plotting capabilities? I tried a few ways, both of running the seaborn code and of preprocessing the dataset so it's on the correct format, with no luck.

Seaborn is just an api for matplotlib, and pandas is using matplotlib. pandas does stacked bars, seaborn does not. Use [ggplot styles in Python](https://stackoverflow.com/a/22543333/7758804), which is the style difference. — Trenton McKinney, Nov 04 '21 at 23:14
It should be `df.groupby(['Rank'])['Clicked'].value_counts(normalize=True).unstack().plot(kind='bar', stacked=True)`. — Trenton McKinney, Nov 04 '21 at 23:27
Groupby should be normalize with value_counts: [How to create a groupby dataframe without a multi-level index](https://stackoverflow.com/q/63970997/7758804) — Trenton McKinney, Nov 04 '21 at 23:33
You can pass anything through to the underlying matplotlib call using the `**kwargs` at the end of each seaborn argument list. But! I often have to read the seaborn code to figure out exactly how to do that, and finding a style option for matplotlib can be easier. — cphlewis, Nov 04 '21 at 23:41

score 7 · Answer 1 · answered Nov 05 '21 at 11:09

7

e.g.

tips = sns.load_dataset("tips")
sns.histplot(
    data=tips,
    x="size", hue="day",
    multiple="fill", stat="proportion",
    discrete=True, shrink=.8
)

answered Nov 05 '21 at 11:09

mwaskom

46,693
16
125
127

Quang Hoang · Accepted Answer · 2021-11-05T01:46:16.080

Seaborn doesn't support stacked barplot, so you need to plot the cumsum:

# calculate the distribution of `Clicked` per `Rank`
distribution = pd.crosstab(df.Rank, df.Clicked, normalize='index')

# plot the cumsum, with reverse hue order
sns.barplot(data=distribution.cumsum(axis=1).stack().reset_index(name='Dist'),
            x='Rank', y='Dist', hue='Clicked',
            hue_order = distribution.columns[::-1],   # reverse hue order so that the taller bars got plotted first
            dodge=False)

Output:

Preferably, you can also reverse the cumsum direction, then you don't need to reverse hue order:

sns.barplot(data=distribution.iloc[:,::-1].cumsum(axis=1)       # we reverse cumsum direction here
                       .stack().reset_index(name='Dist'),
            x='Rank', y='Dist', hue='Clicked',
            hue_order=distribution.columns,                     # forward order
            dodge=False)

Output:

I don't know why someone downvoted your answer... it's really good! I do have a question: how could I give a custom hue_order, for example: Cat2, Cat4,Cat1,Cat3? Tried passing it as a list but it's not accepting it — amestrian, Nov 05 '21 at 11:44
@Quang, is there a way to add labels (which are not cumulative) to this chart ? — PriyankaJ, Dec 12 '22 at 12:16

How to plot stacked 100% bar plot with seaborn for categorical data

2 Answers2

Linked