I have two dataframes (df1 and df2). df1 has all user IDs, purchase dates, and purchase amounts:
index user date_1 value
0 a 2021-02-05 50.0
1 b 2021-02-20 12.0
2 a 2021-03-06 45.0
3 c 2021-03-05 30.0
4 c 2021-03-15 44.0
... ... ... ...
df2 has user IDs and the date they became a user:
index user date_2
0 a 2021-02-05
1 b 2021-02-20
2 c 2021-03-06
3 d 2021-03-20
... ... ...
Goal: I am looking to add a new column to df2 that sums the value of each user's first month of transactions.
I found the following thread:
Pandas group by then count sum...
This keeps everything in a single dataframe, however, and I am getting lost trying to mix the apply, lambda, groupby, and sum functions. I have gotten to the following setup, but it returns incorrect values for the new column:
def f(x, y, t):
return x.apply(lambda y: x.loc[x['date_1'].between(y['date_2'],
y['date_2'] + t,
inclusive=False),'value'].sum(),axis=1)
df2['monthly_volume'] = df1.groupby('user', group_keys=False).
apply(f, df2,
pd.Timedelta(30, unit='D'))
Eventually, as I get more data, I would also look to find the purchase volume of users in their second month, third month, etc. after signing up. Maybe there is a better of going about this to make it repeatable in addition to fixing it. Appreciate the help!