I am working on building an additional column in a data frame which is the sum of two rows for one Time Period. A picture is attached here:
I want to create a new column which is the Sum of Lives for 'IN' and 'SA' in the 'BillType' column for each TimePeriodId. That way I will have one 'Lives Total' entry for a single TimePeriodId. I have gone through a lot of documentation and cant figure out how I would do that in this case.
code sample:
sa = pd.read_sql(sa_q1, sql_conn)
#convert TimePeriodId to string values
sa['TimePeriodId'] = sa['TimePeriodId'].astype(str)
sa = sa.loc[(sa['BillType'] =='SA') | (sa['BillType']=='IN')]#.drop(['BillType'], axis = 1)
sa.head(10).to_dict()
#the last line returns the following:
{'TimePeriodId': {1: '201811',
2: '201811',
4: '201812',
5: '201812',
9: '201901',
11: '201901',
13: '201902',
14: '201902',
17: '201903',
18: '201903'},
'BillType': {1: 'IN',
2: 'SA',
4: 'IN',
5: 'SA',
9: 'SA',
11: 'IN',
13: 'IN',
14: 'SA',
17: 'IN',
18: 'SA'},
'Lives': {1: 1067,
2: 288028,
4: 1058,
5: 287501,
9: 293560,
11: 1068,
13: 1089,
14: 278850,
17: 1076,
18: 276961}}
Any help would be appreciated!