I'd like to make a bar plot in python with multiple x-categories from counts of data either "yes" or "no". I've started on some code but I believe the track I'm on in a slow way of getting to the solution I want. I'd be fine with a solution that uses either seaborn, Matplotlib, or pandas but not Bokeh because I'd like to make publication-quality figures that scale.
Ultimately what I want is:
- bar plot with the categories "canoe", "cruise", "kayak" and "ship" on the x-axis
- grouped-by "color", so either Green or Red
- showing the proportion of "yes" responses: so number of yes rows divided by the count of "red" and "greens" which in this case is 4 red and 4 green, but that could change.
Here's the dataset I'm working with:
import pandas as pd
data = [{'ship': 'Yes','canoe': 'Yes', 'cruise': 'Yes', 'kayak': 'No','color': 'Red'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'Yes','canoe': 'No','color': 'Green'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'No','canoe': 'No','color': 'Green'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'No','canoe': 'No','color': 'Red'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'Yes','canoe': 'No','color': 'Red'},{'ship': 'No', 'cruise': 'Yes', 'kayak': 'No','canoe': 'Yes','color': 'Green'},{'ship': 'No', 'cruise': 'No', 'kayak': 'No','canoe': 'No','color': 'Green'},{'ship': 'No', 'cruise': 'No', 'kayak': 'No','canoe': 'No','color': 'Red'}]
df = pd.DataFrame(data)
This is what I've started with:
print(df['color'].value_counts())
red = 4 # there must be a better way to code this rather than manually. Perhaps using len()?
green = 4
# get count per type
ca = df['canoe'].value_counts()
cr = df['cruise'].value_counts()
ka = df['kayak'].value_counts()
sh = df['ship'].value_counts()
print(ca, cr, ka, sh)
# group by color
cac = df.groupby(['canoe','color'])
crc = df.groupby(['cruise','color'])
kac = df.groupby(['kayak','color'])
shc = df.groupby(['ship','color'])
# make plots
cac2 = cac['color'].value_counts().unstack()
cac2.plot(kind='bar', title = 'Canoe by color')
But really what I want is all of the x-categories to be on one plot, only showing the result for "Yes" responses, and taken as the proportion of "Yes" rather than just counts. Help?