6

I have a dataset that looks like this (assume this has 4 categories in Clicked, the head(10) only showed 2 categories):

    Rank Clicked
0   2.0 Cat4
1   2.0 Cat4
2   2.0 Cat4
3   1.0 Cat1
4   1.0 Cat4
5   2.0 Cat4
6   2.0 Cat4
7   3.0 Cat4
8   5.0 Cat4
9   5.0 Cat4

This is a code that returns this plot:

eee = (df.groupby(['Rank','Clicked'])['Clicked'].count()/df.groupby(['Rank'])['Clicked'].count())
eee.unstack().plot.bar(stacked=True)
plt.legend(['Cat1','Cat2','Cat3','Cat4'])
plt.xlabel('Rank')

enter image description here

Is there a way to achieve this with seaborn (or matplotlib) instead of the pandas plotting capabilities? I tried a few ways, both of running the seaborn code and of preprocessing the dataset so it's on the correct format, with no luck.

amestrian
  • 546
  • 3
  • 12
  • 1
    Seaborn is just an api for matplotlib, and pandas is using matplotlib. pandas does stacked bars, seaborn does not. Use [ggplot styles in Python](https://stackoverflow.com/a/22543333/7758804), which is the style difference. – Trenton McKinney Nov 04 '21 at 23:14
  • 1
    It should be `df.groupby(['Rank'])['Clicked'].value_counts(normalize=True).unstack().plot(kind='bar', stacked=True)`. – Trenton McKinney Nov 04 '21 at 23:27
  • Groupby should be normalize with value_counts: [How to create a groupby dataframe without a multi-level index](https://stackoverflow.com/q/63970997/7758804) – Trenton McKinney Nov 04 '21 at 23:33
  • You can pass anything through to the underlying matplotlib call using the `**kwargs` at the end of each seaborn argument list. But! I often have to read the seaborn code to figure out exactly how to do that, and finding a style option for matplotlib can be easier. – cphlewis Nov 04 '21 at 23:41

2 Answers2

7

e.g.

tips = sns.load_dataset("tips")
sns.histplot(
    data=tips,
    x="size", hue="day",
    multiple="fill", stat="proportion",
    discrete=True, shrink=.8
)

enter image description here

mwaskom
  • 46,693
  • 16
  • 125
  • 127
3

Seaborn doesn't support stacked barplot, so you need to plot the cumsum:

# calculate the distribution of `Clicked` per `Rank`
distribution = pd.crosstab(df.Rank, df.Clicked, normalize='index')

# plot the cumsum, with reverse hue order
sns.barplot(data=distribution.cumsum(axis=1).stack().reset_index(name='Dist'),
            x='Rank', y='Dist', hue='Clicked',
            hue_order = distribution.columns[::-1],   # reverse hue order so that the taller bars got plotted first
            dodge=False)

Output:

enter image description here

Preferably, you can also reverse the cumsum direction, then you don't need to reverse hue order:

sns.barplot(data=distribution.iloc[:,::-1].cumsum(axis=1)       # we reverse cumsum direction here
                       .stack().reset_index(name='Dist'),
            x='Rank', y='Dist', hue='Clicked',
            hue_order=distribution.columns,                     # forward order
            dodge=False)

Output:

enter image description here

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • I don't know why someone downvoted your answer... it's really good! I do have a question: how could I give a custom hue_order, for example: Cat2, Cat4,Cat1,Cat3? Tried passing it as a list but it's not accepting it – amestrian Nov 05 '21 at 11:44
  • @Quang, is there a way to add labels (which are not cumulative) to this chart ? – PriyankaJ Dec 12 '22 at 12:16