1

I have a dataframe with a series of days, users active each day and events for that user on that day. I want to add a column that gives me the total number of events for each user over the total time span in another column.

I can make it work with this code but I'm certain there's a more elegant way to do it. Please let me know what could be better!

df1 = pd.DataFrame({'users': ['Sara', 'James', 'Sara', 'James'],
                    'events': [3, 2, 5, 1]
                   })
df2 = df1.groupby('users').sum()
df2.rename(columns= {'events' : 'total'}, inplace=True)
df3 = pd.merge(df1, df2, how='left', on='users')

This gives me the output I want with 8 in every Sara row and 3 in every James row.

Tej
  • 86
  • 5
amanda
  • 99
  • 1
  • 3

1 Answers1

0

There is indeed, do you know about the transform method? it returns a groupby in the same format as your current dataframe

df1['total'] = df1.groupby('users').transform('sum')
print(df1)
      users  events  total
0   Sara       3      8
1  James       2      3
2   Sara       5      8
3  James       1      3

just as a test

df1 == df3
    users   events  total
0   True    True    True
1   True    True    True
2   True    True    True
3   True    True    True

more here :

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html

https://pbpython.com/pandas_transform.html

Umar.H
  • 22,559
  • 7
  • 39
  • 74