Grouping and pivoting DataFrame with additional column for ratio of counts

Question

I have a dataframe that looks like this:

id       status      year 
1        yes         2014
3        no          2013
2        yes         2014
4        no          2014

The actual dataframe is very large with multiple ids and years. I am trying to make a new dataframe that has the percents of 'yes's and 'no's grouped by year.

I was thinking of grouping the dataframe by the year, which would then put the statuses per year in a list and then analyzing the counts of yes's and no's that way, but I was wondering whether there is a more pythonic way to do this?

I would like for the end dataframe to look like this:

year      yes_count     no_count     ratio_yes_to_toal    
2013       0             1             0%
2014       2             1             67%

Tim · Accepted Answer · 2018-12-19T16:37:53.567

I'd suggest grouping by year and status, counting, pivoting, and then creating an additional column of the ratio:

df2 = df.groupby(['year', 'status']).count().pivot_table(index="year", columns=["status"]).fillna(0)
df2.columns = df2.columns.get_level_values(1)
df2['ratio'] = df2['yes'] / (df2['yes'] + df2['no'])

Output

status   no  yes     ratio
year                      
2013    1.0  0.0  0.000000
2014    1.0  2.0  0.666667

cs95 · Answer 2 · 2018-12-19T17:02:02.140

2

You can use pivot_table without the groupby for a faster solution.

v = df.pivot_table(index='year', columns='status', aggfunc='size').fillna(0) 
v.join((v['yes'] / v.sum(1)).to_frame('ratio'))
# Or, if you care about micro-optimizing the code, 
# v['ratio'] = v['yes'] / v.sum(1)

       no  yes     ratio
year                    
2013  1.0  0.0  0.000000
2014  1.0  2.0  0.666667

edited Dec 19 '18 at 17:02

answered Dec 19 '18 at 16:46

cs95

379,657
97
704
746

ALollz · Answer 3 · 2018-12-19T16:39:04.780

To get the counts and percentages, probably easiest to do it in two steps with groupby + value_counts + unstack, followed by the division.

df1 = df.groupby('year').status.value_counts().unstack(1).fillna(0)
df1['ratio_yes'] = df1['yes'].div(df1.sum(1), axis=0)*100

#status   no  yes  ratio_yes
#year                       
#2013    1.0  0.0   0.000000
#2014    1.0  2.0  66.666667

If you just wanted percentages, then you could add the normalize=True argument, and do it in one step:

df1 = df.groupby('year').status.value_counts(normalize=True).unstack(1).fillna(0)*100

#status          no        yes
#year                         
#2013    100.000000   0.000000
#2014     33.333333  66.666667

score 0 · Answer 4 · answered Dec 19 '18 at 17:46

`pd.crosstab`

You can cross-tabulate your dataframe, then calculate your ratio:

res = pd.crosstab(df['year'], df['status'])
res['yes_pct'] = res['yes'].div(res.sum(1))

print(res)

status  no  yes   yes_pct
year                     
2013     1    0  0.000000
2014     1    2  0.666667

Grouping and pivoting DataFrame with additional column for ratio of counts

4 Answers4

pd.crosstab

`pd.crosstab`