0

I am trying to speed up an Excel process that normally takes a few hours to hopefully get it to a few minutes with python. I set up a pivot table with pandas, and I like what I got for the output, but now I am trying to setup a new column and I cannot seem to figure out how to create it in the way that I am looking for. I already combined two columns into the pivot table, but now I am trying to sum the entries in that column for each year and insert that sum into a new column for every row of that year.

This is what my code looks like so far. It does not include any of my attempts to make this column as none of them have come close to working.

import pandas as pd
import numpy as np
    
ld = pd.read_excel('LossData.xlsx')

ld['RsvT'] = ld['RsvLoss_ITD'] + ld['RsvExpense_ITD']
    
pivot = pd.pivot_table(ld, index = [ 'AY', 'Fusion_Program', 'Fusion_Office_Adj', 'AsLob', 'LosOccurredState'], values = ['RsvT'], aggfunc=np.sum)
pivot.to_excel("test.xlsx", merge_cells=False)
pt = pd.read_excel('test.xlsx')

It gives this as the output (numbers are fudged as I am not sure I can put the real ones):

     AY  Fusion_Program  Fusion_Office_Adj  AsLob LosOccurredState  RsvT
0  2011            1999                 10     52               PA     5
1  2012            1100                 10     52               NJ     6
2  2013            1999                 10     52               FL     3
3  2014            1100                 10     52               PA     7
4  2014            1150                 10     52               FL     8
5  2014            1999                 10     52               FL     4
6  2014            1999                 10     52               PA     3
7  2014            5000                 10    171               IN     2

and I want the final output to look like this:

   AY  Fusion_Program  Fusion_Office_Adj  AsLob LosOccurredState  RsvT  YearT
0  2011            1999                 10     52               PA     5      5
1  2012            1100                 10     52               NJ     6      6
2  2013            1999                 10     52               FL     3      3
3  2014            1100                 10     52               PA     7     24
4  2014            1150                 10     52               FL     8     24
5  2014            1999                 10     52               FL     4     24
6  2014            1999                 10     52               PA     3     24
7  2014            5000                 10    171               IN     2     24

The YearT column is summing the RsvT column for each year and then spitting out that total into each row. So, for 2014, it did 7+8+4+3+2 = 24 and then put that 24 into YearT. The middle 4 columns can be ignored for the calculation but I would like them included in the final output. I know I will probably need to use the groupby function is some form, but I do not know how to get the output into this format.

SaicheS
  • 11
  • 3
  • You can use [`.groupby.transform`](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.transform.html): `ld["YearT"] = ld.groupby("AY")["RsvT"].transform("np.sum")` You could use `"sum"` alternatively. – Rawson Jun 19 '23 at 17:38
  • I edited yours a little bit to this: pt['YearT'] = pt.groupby('AY')['RsvT'].transform('sum') as I should have clarified that I wanted it in the pivot table sheet. It gave me the output I was looking for though, thanks! – SaicheS Jun 19 '23 at 17:54

0 Answers0