1

I'm using plotnine to draw some plots. When I try to display a bar chart of proportion rather than count, the fill argument becomes useless. I noticed that removing the group=1 arguments helps to get the fill argument "active" again. However, without the group=1 argument, the proportions are not correctly calculated.

Here is my function:

def plot_churn(df_):
   color_dict = {
       'Stayed': 'green',
       'Churned': 'red'
   }

   myplot = ggplot(data=df_, mapping=aes(x='Flag_Churned', fill='Flag_Churned'))
   myplot += geom_bar(mapping=aes(y="stat(prop)", group=1))
   myplot += theme(subplots_adjust={'right': 0.71})
   myplot += facet_wrap('Flag_Treat')
   myplot += scale_fill_manual(color_dict)
   myplot += scale_y_continuous(labels=percent_format())
   print(myplot)

For example, when using the following pandas DataFrame:

data = {'Churn': [0,0,0,1,1,0,1,1], 'Flag_Treat': ['treated','treated','treated','treated','not treated','not treated','not treated','not treated'],
    'Flag_Churned': ['Stayed', 'Stayed', 'Stayed', 'Churned', 'Churned', 'Stayed', 'Churned', 'Churned']}
df = pd.DataFrame(data=data)

the resulting output is not filled by 'Flag_Churned':

Plot

What am I doing wrong?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Jannik
  • 965
  • 2
  • 12
  • 21

1 Answers1

1

The issue is that stat(prop) computes the props per facet. While setting the group aesthetic will give you the right props it overrides the grouping by fill. Having an R background I know how to do this computation on the fly in R. However, the easier approach and most of the time suggested in R is to aggregate your data before passing it to ggplot and make use of geom_col instead of geom_bar:

from mizani.formatters import percent_format
from plotnine import *
import pandas as pd
import numpy as np

data = {'Churn': [0,0,0,1,1,0,1,1], 'Flag_Treat': ['treated','treated','treated','treated','not treated','not treated','not treated','not treated'],
    'Flag_Churned': ['Stayed', 'Stayed', 'Stayed', 'Churned', 'Churned', 'Stayed', 'Churned', 'Churned']}
df = pd.DataFrame(data=data)

df_.group_by(['Flag_Churned', 'Flag_Treat']).agg(len)

color_dict = {
  'Stayed': 'green',
  'Churned': 'red'
}

def plot_churn(df_):
   color_dict = {
       'Stayed': 'green',
       'Churned': 'red'
   }
                                                 
   df_ = df_.groupby(['Flag_Churned', 'Flag_Treat']).agg(len)
   df_ = df_.groupby(level=0).apply(lambda x: x / float(x.sum())).reset_index()
  
   myplot = ggplot(data=df_, mapping=aes(x='Flag_Churned', y='Churn', fill='Flag_Churned'))
   myplot += geom_col()
   myplot += theme(subplots_adjust={'right': 0.71})
   myplot += facet_wrap('Flag_Treat')
   myplot += scale_fill_manual(color_dict)
   myplot += scale_y_continuous(labels=percent_format())
   print(myplot)

plot_churn(df)

enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thanks for your help. Unfortunately the numbers are not correct. You calculate `count/np.sum(count)`. But I have to calculate the percentage in the respective group, that is: `count/np.sum(count(treated))` and `count/np.sum(count(not treated))`. Any ideas how to do that? Thanks! – Jannik Jun 26 '21 at 13:25
  • 1
    Hi Gusto. Sorry. Should have realized that. I just made an edit to fix that, which however involves aggregating the data before passing it to ggplot. As I have an R background and just started or (re-)started with python and plotnine I don't know whether there is an easy way to compute the props on the fly. I know how to do it in ggplot2 but even there my default suggestion would be to aggregate the data before passing it to ggplot. – stefan Jun 26 '21 at 14:32
  • Thank you very much! It works as a charm :-) Aggregating the data before passing it to plotnine sounds reasonable. – Jannik Jun 26 '21 at 15:10