Create new column that is the sum of two rows, but repeat every two rows

Question

I am working on building an additional column in a data frame which is the sum of two rows for one Time Period. A picture is attached here:

I want to create a new column which is the Sum of Lives for 'IN' and 'SA' in the 'BillType' column for each TimePeriodId. That way I will have one 'Lives Total' entry for a single TimePeriodId. I have gone through a lot of documentation and cant figure out how I would do that in this case.

code sample:

sa = pd.read_sql(sa_q1, sql_conn)

#convert TimePeriodId to string values

sa['TimePeriodId'] = sa['TimePeriodId'].astype(str)

sa = sa.loc[(sa['BillType'] =='SA') | (sa['BillType']=='IN')]#.drop(['BillType'], axis = 1)

sa.head(10).to_dict()

#the last line returns the following:

{'TimePeriodId': {1: '201811',
  2: '201811',
  4: '201812',
  5: '201812',
  9: '201901',
  11: '201901',
  13: '201902',
  14: '201902',
  17: '201903',
  18: '201903'},
 'BillType': {1: 'IN',
  2: 'SA',
  4: 'IN',
  5: 'SA',
  9: 'SA',
  11: 'IN',
  13: 'IN',
  14: 'SA',
  17: 'IN',
  18: 'SA'},
 'Lives': {1: 1067,
  2: 288028,
  4: 1058,
  5: 287501,
  9: 293560,
  11: 1068,
  13: 1089,
  14: 278850,
  17: 1076,
  18: 276961}}

Any help would be appreciated!

Please include the input **as text** in the question. Also, please include te expected output. — Roy2012, Jul 21 '20 at 12:57
I might try to post an answer, if you gave the data as copyable text instead of an image... — Serge Ballesta, Jul 21 '20 at 12:57
Sorry, im new to python and am unsure about what you mean by "input as text"? — KeithRoberts, Jul 21 '20 at 12:58
@KeithRoberts Post `df.to_dict()` to the question, if `df` is large post `df.head(10).to_dict()` and expected output. This makes reproducing your data locally easy. — Ch3steR, Jul 21 '20 at 12:59

score 1 · Accepted Answer · answered Jul 21 '20 at 13:02

1

You can try to use pandas.DataFrame.groupby() method to compute sum of lives for every time period. After that you can enrich sa dataframe by the computed column using pandas.DataFrame.transform() method.

>>> sa['LivesTotal'] = sa.groupby('TimePeriodId').Lives.transform('sum')

answered Jul 21 '20 at 13:02

Jaroslav Bezděk

6,967
6
29
46

This works well but I end up with duplicates. I use.drop_duplicates() and get nan values. This is okay for me so long as I can build a graph from the df which doesn't take in the nan values – KeithRoberts Jul 21 '20 at 13:07
1

@JaroslavBezděk - hmmm, if not dupe it is OK ask for accepting, for dupes the best remve answer... – jezrael Jul 21 '20 at 13:12
Sorry, did not notice. – Jaroslav Bezděk Jul 21 '20 at 13:16

Create new column that is the sum of two rows, but repeat every two rows

1 Answers1