I have a dataframe with a series of days, users active each day and events for that user on that day. I want to add a column that gives me the total number of events for each user over the total time span in another column.
I can make it work with this code but I'm certain there's a more elegant way to do it. Please let me know what could be better!
df1 = pd.DataFrame({'users': ['Sara', 'James', 'Sara', 'James'],
'events': [3, 2, 5, 1]
})
df2 = df1.groupby('users').sum()
df2.rename(columns= {'events' : 'total'}, inplace=True)
df3 = pd.merge(df1, df2, how='left', on='users')
This gives me the output I want with 8 in every Sara row and 3 in every James row.