My dataframe has more than 10 columns and each column has values like yes/no/na/not specified.
And I want to calculate the count of occurrences in each column and create stacked bar graph.
Below is the image that I need:
My dataframe has more than 10 columns and each column has values like yes/no/na/not specified.
And I want to calculate the count of occurrences in each column and create stacked bar graph.
Below is the image that I need:
Yes, this is possible. But you'll need to re-format your data a little first.
Here's the dataset I'm using in this example. It has the labels in the columns, and 1000 random Yes, No or Maybe responses as values.
asthma boneitis diabetes pneumonia
0 No No Yes Maybe
1 No No No Yes
2 No No No No
3 Yes No No Maybe
4 Yes No No Maybe
.. ... ... ... ...
995 No No Yes No
996 Maybe Yes Yes Yes
997 No No No Yes
998 No No No No
999 No No Maybe No
In order to format the data correctly for the plot, do this:
df2 = df.stack().groupby(level=[1]).value_counts().unstack()
# Preferred order of stacked bar elements
stack_order = ['Yes', 'Maybe', 'No']
df2 = df2[stack_order]
At this point, the data looks like this:
Yes Maybe No
asthma 83 83 834
boneitis 174 173 653
diabetes 244 260 496
pneumonia 339 363 298
Now you're ready to plot the data. Here's the code to do that:
df2.plot.bar(rot=0, stacked=True)
I'm using rot=0
to avoid rotating the text labels (they would normally be at a 45 degree angle,) and stacked=True
to produce a stacked bar chart.
The plot looks like this:
Code for generating test data set:
import pandas as pd
import numpy as np
categories = [
'asthma',
'boneitis',
'diabetes',
'pneumonia',
]
distribution = {
cat: (i + 1) / 12
for i, cat in enumerate(categories)
}
df = pd.DataFrame({
cat: np.random.choice(['Yes', 'Maybe', 'No'], size=1000, p=[prob, prob, 1 - 2 * prob])
for cat, prob in distribution.items()
})