0

I am working on building an additional column in a data frame which is the sum of two rows for one Time Period. A picture is attached here:

enter image description here

I want to create a new column which is the Sum of Lives for 'IN' and 'SA' in the 'BillType' column for each TimePeriodId. That way I will have one 'Lives Total' entry for a single TimePeriodId. I have gone through a lot of documentation and cant figure out how I would do that in this case.

code sample:

sa = pd.read_sql(sa_q1, sql_conn)

#convert TimePeriodId to string values

sa['TimePeriodId'] = sa['TimePeriodId'].astype(str)

sa = sa.loc[(sa['BillType'] =='SA') | (sa['BillType']=='IN')]#.drop(['BillType'], axis = 1)

sa.head(10).to_dict()

#the last line returns the following:

{'TimePeriodId': {1: '201811',
  2: '201811',
  4: '201812',
  5: '201812',
  9: '201901',
  11: '201901',
  13: '201902',
  14: '201902',
  17: '201903',
  18: '201903'},
 'BillType': {1: 'IN',
  2: 'SA',
  4: 'IN',
  5: 'SA',
  9: 'SA',
  11: 'IN',
  13: 'IN',
  14: 'SA',
  17: 'IN',
  18: 'SA'},
 'Lives': {1: 1067,
  2: 288028,
  4: 1058,
  5: 287501,
  9: 293560,
  11: 1068,
  13: 1089,
  14: 278850,
  17: 1076,
  18: 276961}}

Any help would be appreciated!

  • 3
    Please include the input **as text** in the question. Also, please include te expected output. – Roy2012 Jul 21 '20 at 12:57
  • 1
    I might try to post an answer, if you gave the data as copyable text instead of an image... – Serge Ballesta Jul 21 '20 at 12:57
  • Sorry, im new to python and am unsure about what you mean by "input as text"? – KeithRoberts Jul 21 '20 at 12:58
  • 1
    @KeithRoberts Post `df.to_dict()` to the question, if `df` is large post `df.head(10).to_dict()` and expected output. This makes reproducing your data locally easy. – Ch3steR Jul 21 '20 at 12:59

1 Answers1

1

You can try to use pandas.DataFrame.groupby() method to compute sum of lives for every time period. After that you can enrich sa dataframe by the computed column using pandas.DataFrame.transform() method.

>>> sa['LivesTotal'] = sa.groupby('TimePeriodId').Lives.transform('sum')
Jaroslav Bezděk
  • 6,967
  • 6
  • 29
  • 46