0

I have a dataframe I am working with and I want to groupby 2 columns, aggregate (sum) another one, and return a new column to the dataframe.

When running the below I can return a whole new dataframe that gets me my results:

df.groupby(['Account', 'Year']).agg({'Dollars':'sum'})

However I want the return to be in a single column of the original dataframe. I know I could merge this new df with the original as a round-about way of doing this, but I would like to find a single line of code that would complete the task.

For reference, when trying this method I get an error due to 2 variables in the groupby:

df['Account_Total'] = df['Dollars'].groupby(['Account', 'Year']).transform('sum')

However if I run the above with only 'Account' in the groupby it runs.

The final way I have tried is:

df['Account_Total'] = df.groupby(['Account', 'Year']).agg({'Dollars':'sum'})

Which clearly does not work as I am trying to add a new dataframe as a column of a dataframe. (Returns a TypeError: incompatible index of inserted column with frame index)

Hoping there is an easy solution otherwise I will resort to the merge.

  • 3
    You are close on the transform. The error is because you are filtering dataframe only to `Dollars` before the `groupby` and resulting in groupby cannot find the 2 columns. Try this `df.groupby(['Account', 'Year']).Dollars.transform(sum)` – Emma Feb 18 '22 at 16:00
  • 1
    @Emma yup that was it. Thanks! – Luke Morris Feb 18 '22 at 16:05

0 Answers0