0

I would like to add count and percentage labels to a grouped bar chart, but I haven't been able to figure it out.
I've seen examples for count or percentage for single bars, but not for grouped bars.

the data looks something like this (not the real numbers):
   age_group   Mis   surv   unk   death  total  surv_pct  death_pct
 0      0-9     1     2     0     3       6       100.0       0.0 
 1    10-19     2     1     0     1       4        99.9       0.0
 2    20-29     0     3     0     1       4        99.9       0.0
 3    30-39     0     7     1     2      10       100.0       0.0
`4    40-49     0     5     0     1       6        99.7       0.3
 5    50-59     0     6     0     4      10        99.3       0.3
 6    60-69     0     7     1     4      12        98.0       2.0
 7    70-79     1     8     2     5      16        92.0       8.0       
 8    80+       0    10     0     7      17        81.0      19.0

And The chart looks something like this grouped barchart

I created the chart with this code:

ax = df.plot(y=['deaths', 'surv'],
             kind='barh',
             figsize=(20,9),
             rot=0,
             title= '\n\n surv and deaths by age group')

ax.legend(['Deaths', 'Survivals']);
ax.set_xlabel('\nCount');
ax.set_ylabel('Age Group\n');


How could I add count and percentage labels to the grouped bars? I would like it to look something like this chart
grouped barchart with count and percentage labels

Mr. T
  • 11,960
  • 10
  • 32
  • 54
laura
  • 1
  • 2
  • https://stackoverflow.com/questions/30228069/how-to-display-the-value-of-the-bar-on-each-bar-with-pyplot-barh – BigBen Jan 14 '21 at 21:23
  • @BigBen this shows one value on the labels. I'm looking to show both count and percentage for grouped bars. – laura Jan 14 '21 at 21:36
  • @BigBen I just uploaded a chart image of how I would like the chart to look. Thank you for the question. – laura Jan 14 '21 at 21:57
  • 1
    How strange. I just had a similar seaborn question. I am sure you will be able to adapt Ben's link with this: https://stackoverflow.com/a/65725322/8881141 – Mr. T Jan 14 '21 at 22:01
  • Hi @Mr.T, I deleted the link because nobody commented on it, so I thought it wasn't relevant. Yes, I was able to find a solution with the help of those links and a python meetup group and I'm very grateful for it. There was no bad intention. I'll also add the appropriate credits. Unfortunately, due having other responsibilities, which include teaching remote school to my three young children, I do my code work in one tiny bit at a time, and didn't have time to do it all at once. – laura Jan 18 '21 at 17:27
  • Thanks for replying. There is a current tendency on SO to treat people who answer as if they were employees: "Actually, I wanted a log-log plot, and I want a different color scheme", and if you then change the code, they simply disappear, and your code turns up in their next question. I apologize if I wrongly assumed you are in this category. No credit necessary, a simple comment with "Thank you, this helped" is enough. And if an answer solved your problem, you should consider [accepting it](https://stackoverflow.com/help/accepted-answer); otherwise, the question is considered open. – Mr. T Jan 18 '21 at 17:36
  • @Mr.T, Thank you for sharing the link to accepting answers. I've been trying to figure out how to do that. This is my first time posting a question that has been answered. I also haven't had a chance to try the code you posted as an answer below, as the meetup group came up with their own solution, but I'll try it when I have a moment. – laura Jan 18 '21 at 21:29

1 Answers1

0

Since nobody else has suggested anything, here is one way to approach it with your dataframe structure.

from matplotlib import pyplot as plt
import pandas as pd

df = pd.read_csv("test.txt", delim_whitespace=True)

cat = ['death', 'surv']

ax = df.plot(y=cat,
             kind='barh',
             figsize=(20, 9),
             rot=0,
             title= '\n\n surv and deaths by age group')

#making space for the annotation
xmin, xmax = ax.get_xlim()
ax.set_xlim(xmin, 1.05 * xmax)

#connecting bar series with df columns
for cont, col in zip(ax.containers, cat):
    #connecting each bar of the series with its absolute and relative values 
    for rect, vals, perc in zip(cont.patches, df[col], df[col+"_pct"]):
        #annotating each bar
        ax.annotate(f"{vals} ({perc:.1f}%)", (rect.get_width(), rect.get_y() + rect.get_height() / 2.),
                     ha='left', va='center', fontsize=10, color='black', xytext=(3, 0),
                     textcoords='offset points')

ax.set_yticklabels(df.age_group)
ax.set_xlabel('\nCount')
ax.set_ylabel('Age Group\n')
ax.legend(['Deaths', 'Survivals'], loc="lower right")
plt.show()

Sample output: enter image description here

If the percentages per category add up, one could also calculate the percentages on the fly. This would then not necessitate that the percentage columns have exactly the same name structure. Another problem is that the font size of the annotation, the scaling to make space for labeling the largest bar, and the distance between bar and annotation are not interactive and may need fine-tuning.
However, I am not fond of this mixing of pandas and matplotlib plotting functions. I had cases where the axis definition by pandas interfered with matplotlib, and datetime objects ... well, let's not talk about that.

Mr. T
  • 11,960
  • 10
  • 32
  • 54