How to produce a stacked bar plot for the value counts of all columns

Question

My dataframe has more than 10 columns and each column has values like yes/no/na/not specified.

And I want to calculate the count of occurrences in each column and create stacked bar graph.

Below is the image that I need:

stacked bar chart

Nick ODell · Answer 1 · 2022-11-24T03:46:35.230

Yes, this is possible. But you'll need to re-format your data a little first.

Here's the dataset I'm using in this example. It has the labels in the columns, and 1000 random Yes, No or Maybe responses as values.

    asthma boneitis diabetes pneumonia
0       No       No      Yes     Maybe
1       No       No       No       Yes
2       No       No       No        No
3      Yes       No       No     Maybe
4      Yes       No       No     Maybe
..     ...      ...      ...       ...
995     No       No      Yes        No
996  Maybe      Yes      Yes       Yes
997     No       No       No       Yes
998     No       No       No        No
999     No       No    Maybe        No

In order to format the data correctly for the plot, do this:

df2 = df.stack().groupby(level=[1]).value_counts().unstack()
# Preferred order of stacked bar elements
stack_order = ['Yes', 'Maybe', 'No']
df2 = df2[stack_order]

At this point, the data looks like this:

           Yes  Maybe   No
asthma      83     83  834
boneitis   174    173  653
diabetes   244    260  496
pneumonia  339    363  298

Now you're ready to plot the data. Here's the code to do that:

df2.plot.bar(rot=0, stacked=True)

I'm using rot=0 to avoid rotating the text labels (they would normally be at a 45 degree angle,) and stacked=True to produce a stacked bar chart.

The plot looks like this:

Appendix

Code for generating test data set:

import pandas as pd
import numpy as np

categories = [
    'asthma',
    'boneitis',
    'diabetes',
    'pneumonia',
]

distribution = {
    cat: (i + 1) / 12
    for i, cat in enumerate(categories)
}

df = pd.DataFrame({
    cat: np.random.choice(['Yes', 'Maybe', 'No'], size=1000, p=[prob, prob, 1 - 2 * prob])
    for cat, prob in distribution.items()
})

How to produce a stacked bar plot for the value counts of all columns

1 Answers1

Appendix