2

I have a pandas dataframe as follows:

   variable        Total_Beneficiaries  gender_text age_group
0   MC_OLD_18>F>5       1534    Female  5 - 18
1   MC_OLD_18>M>5       1651    Male    5 - 18
2   MC_OLD_F>0<=1       192     Female  0 - 1
3   MC_OLD_F>18         2079    Female  18 - 60
4   MC_OLD_F>=60        171     Female  Over 60
5   MC_OLD_F_Child<5    593     Female  1 - 5
6   MC_OLD_M>0<=1       264     Male    0 - 1
7   MC_OLD_M>18         1892    Male    18 - 60
8   MC_OLD_M>=60        106     Male    Over 60
9   MC_OLD_M_Child<5    602     Male    1 - 5

I want to add a column age_group_totals that will be the sum of Total_Beneficiaries across each age group. So for the first two rows the value would be 3185.

So far I have been doing this by creating a new dataframe with the sums and merging back on to the original as follows:

total_by_age = izmir_agg[['age_group','Total_Beneficiaries']].groupby('age_group').agg({'Total_Beneficiaries':np.sum}).reset_index().rename(columns={'Total_Beneficiaries':'age_group_totals'})
izmir_agg = izmir_agg.merge(total_by_age,how='left',on='age_group')

This seems clunky and I'm wondering if there is a way to more directly add this column without creating the separate dataframe. I tried this:

izmir_agg['age_group_totals'] = izmir_agg.groupby('age_group')['Total_Beneficiaries'].sum().tolist()

But it doesn't work because it returns a list of the wrong length. Any tips for how to accomplish this in one step?

Sam Lilienfeld
  • 475
  • 1
  • 5
  • 15

1 Answers1

2

Try:

izmir_agg['agg_totals'] = izmir_agg.groupby('age_group').Total_Beneficiaries.transform('sum')
izmir_agg

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624