2

How can I calculate the first visited date and the last visited date before an order was placed by the user?

USER ID TYPE    DATE
1   Visited September 14, 2020
1   Visited October 4, 2020
1   Visited October 24, 2020
1   Ordered November 1, 2020
2   Visited September 14, 2020
2   Visited October 1, 2020
3   Visited September 1, 2020
3   Visited October 4, 2020
3   Visited October 4, 2020
3   Visited October 19, 2020
3   Ordered January 1, 2021
3   Visited February 11, 2021
3   Visited February 24, 2021
3   Visited March 1, 2021
3   Ordered April 21, 2021

Expected Output:

USER ID Ordered MIN DATE    MAX DATE
1   1   September 14, 2020  October 24, 2020
2   0   September 14, 2020  NAT
3   1   September 1, 2020   October 19, 2020
3   2   February 11, 2021   March 1, 2021
Blogger22
  • 23
  • 5

1 Answers1

1

Try:

df['DATE'] = pd.to_datetime(df['DATE'])

df_out = df.assign(grp=(df['TYPE'] == 'Ordered')[::-1].cumsum())\
           .set_index(['USER ID', 'grp', 'TYPE'], append=True)['DATE']\
           .unstack('TYPE')\
           .groupby(['USER ID', 'grp'], sort=False)\
           .agg(Ordered=('Ordered','count'), 
                MIN_DATE=('Visited','first'), 
                MAX_DATE=('Visited','last'))\
           .reset_index('grp', drop=True)\
           .reset_index()

df_out['MAX_DATE'] = df_out['MAX_DATE'].mask(df_out['Ordered'] == 0)
df_out['Ordered'] = df_out['Ordered'].groupby(df_out['USER ID']).cumsum()

df_out['MIN_DATE'] = df_out['MIN_DATE'].dt.strftime('%B %d, %Y')
df_out['MAX_DATE'] = df_out['MAX_DATE'].dt.strftime('%B %d, %Y')

Output:

   USER ID  Ordered            MIN_DATE          MAX_DATE
0        1        1  September 14, 2020  October 24, 2020
1        2        0  September 14, 2020               NaN
2        3        1  September 01, 2020  October 19, 2020
3        3        2   February 11, 2021    March 01, 2021
Scott Boston
  • 147,308
  • 15
  • 139
  • 187