0

I have data which looks like the below. For each customer, I want to get the spend increase (or decrease) in the last half of 2020 (202007-202012), vs the first half of 2020 (202001-202006). Is there a quick way to do this in pandas?

Customer Date (YYYYMM) Spend
A 202001 $200
B 202001 $400
A 202002 $500
A 202003 $200
A 202004 $700
B 202002 $100
C 202001 $50

...

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
Programmer
  • 1,266
  • 5
  • 23
  • 44

1 Answers1

1

Added few rows for second half of the year to the provided input.

Input

    Customer    Date (YYYYMM)   Spend
0   A           2020-01-01  $200.0
1   B           2020-01-01  $400.0
2   A           2020-02-01  $500.0
3   A           2020-03-01  $200.0
4   A           2020-04-01  $700.0
5   B           2020-02-01  $100.0
6   C           2020-01-01  $50.0
7   A           2020-07-01  $2000.0
8   B           2020-08-01  $250.0
9   C           2020-09-01  $40.0

Code

df['Date (YYYYMM)'] = pd.to_datetime(df['Date (YYYYMM)'], 
                                     format='%Y%m')
df["Spend"] = df["Spend"].str.replace('\$','', regex=True).astype(float)
df = df.groupby(['Customer', pd.Grouper(key='Date (YYYYMM)', 
            freq='6M', closed='left')]).sum().reset_index()
df['SpendDiff'] = df.groupby('Customer', as_index=False).Spend.diff()
df['IncDec'] = np.where(df.SpendDiff>=0,'Increase','Decrease')
df['IncDec'] = df.IncDec.mask(df.SpendDiff.isna(), '')
df
## refomatting Date column to YYYYMM and Spend 
df['Date (YYYYMM)'] = df['Date (YYYYMM)'].dt.to_period('6M').astype(str).replace('-','', regex=True)
df["Spend"] = df.Spend.astype(str).replace(r'^','$', regex=True)
df

Output

    Customer    Date (YYYYMM)   Spend   SpendDiff   IncDec
0   A           202006          $1600.0 NaN 
1   A           202012          $2000.0 400.0       Increase
2   B           202006          $500.0  NaN 
3   B           202012          $250.0  -250.0      Decrease
4   C           202006          $50.0   NaN 
5   C           202012          $40.0   -10.0       Decrease
Utsav
  • 5,572
  • 2
  • 29
  • 43
  • Thank you @Uts! Do you know if there is a way to do the same comparison for Jan-March 2021 (202101-202103) vs July-December 2020 (202007-202012)? – Programmer May 13 '21 at 03:56
  • In that case, we will have to divide df into 2 parts. Say `df1 with dates before 2021 grouped on 6M freq` and `df2 with dates after 2021 grouped on 3M freq` then we can `concat/append` both `df1 and df2` and then `group concatenated df on customer` and `apply diff()` over it – Utsav May 13 '21 at 04:18