3

I'm looking at home ownership within levels of different loan statuses, and I'd like to display this using a stacked bar chart in percentages.

I've been able to create a frequency stacked bar chart using this code:

df_trunc1=df[['loan_status','home_ownership','id']]
sub_df1=df_trunc1.groupby(['loan_status','home_ownership'])['id'].count()
sub_df1.unstack().plot(kind='bar',stacked=True,rot=1,figsize=(8,8),title="Home ownership across Loan Types")

which gives me this picture:1

but I can't figure out how to transform the graph into percentages. So for example, I'd like to get within the default group, which percentage have a mortgage, which own, etc.

Here is my groupby table for context2:

Thanks!!

yogz123
  • 703
  • 3
  • 8
  • 25
  • Add your groupby data to the question as text, not a picture; it makes answering easier and more likely. – cco Nov 05 '16 at 03:08

2 Answers2

4

I believe you need to convert the percentages yourself:

d = {('Default', 'MORTGAGE'): 498, ('Default', 'OWN'): 110, ('Default', 'RENT'): 611, ('Fully Paid', 'MORTGAGE'): 3100, ('Fully Paid', 'NONE'): 1, ('Fully Paid', 'OTHER'): 5, ('Fully Paid', 'OWN'): 558, ('Fully Paid', 'RENT'): 2568, ('Late (16-30 days)', 'MORTGAGE'): 1101, ('Late (16-30 days)', 'OWN'): 260, ('Late (16-30 days)', 'RENT'): 996, ('Late (31-120 days)', 'MORTGAGE'): 994, ('Late (31-120 days)', 'OWN'): 243, ('Late (31-120 days)', 'RENT'): 1081}

sub_df1 = pd.DataFrame(d.values(), columns=['count'], index=pd.MultiIndex.from_tuples(d.keys()))
sub_df2 = sub_df1.unstack()
sub_df2.columns = sub_df2.columns.droplevel()  # Drop `count` label.
sub_df2 = sub_df2.div(sub_df2.sum())
sub_df2.T.plot(kind='bar', stacked=True, rot=1, figsize=(8, 8), 
               title="Home ownership across Loan Types")

enter image description here

sub_df3 = sub_df1.unstack().T
sub_df3.index = sub_df3.index.droplevel()  # Drop `count` label.
sub_df3 = sub_df3.div(sub_df3.sum())
sub_df3.T.plot(kind='bar', stacked=True, rot=1, figsize=(8, 8), 
               title="Home ownership across Loan Types")

enter image description here

Alexander
  • 105,104
  • 32
  • 201
  • 196
  • That's still giving bars of the same relative heights as before instead of equal heights, except now the y-axis goes from 0 to 4. Any thoughts on why that is happening? – yogz123 Nov 05 '16 at 02:35
  • Could you post some sample data, e.g. `sub_df1.to_dict('list')` – Alexander Nov 05 '16 at 02:38
  • It was giving me an error using 'list' so I used 'dict' instead - let me know if that's not helpful. {('Default', 'MORTGAGE'): 498, ('Default', 'OWN'): 110, ('Default', 'RENT'): 611, ('Fully Paid', 'MORTGAGE'): 3100, ('Fully Paid', 'NONE'): 1, ('Fully Paid', 'OTHER'): 5, ('Fully Paid', 'OWN'): 558, ('Fully Paid', 'RENT'): 2568, ('Late (16-30 days)', 'MORTGAGE'): 1101, ('Late (16-30 days)', 'OWN'): 260, ('Late (16-30 days)', 'RENT'): 996, ('Late (31-120 days)', 'MORTGAGE'): 994, ('Late (31-120 days)', 'OWN'): 243, ('Late (31-120 days)', 'RENT'): 1081} – yogz123 Nov 05 '16 at 02:52
  • Thanks so much! Do you know by any chance how to organize the bars by loan status instead of home ownership? – yogz123 Nov 05 '16 at 03:47
  • Amazing! Thank you so much!! – yogz123 Nov 05 '16 at 15:44
0

I calculated the percentage by transposing the dataframe twice. Did it step by step to show the logic more explicitly.

#transpose
to_plot =sub_df1.unstack()
to_plot_transpose = to_plot.transpose()

#calc %
to_plot_transpose_pct = to_plot_transpose.div(to_plot_transpose.sum())

#transpose back
to_plot_pct=to_plot_transpose_pct.transpose()

#plot
to_plot_pct.plot(kind='bar',stacked=True,rot=1,figsize= . 
  (8,8),title="Home ownership across Loan Types")
Jake
  • 1,550
  • 1
  • 11
  • 12