0

I have some customer data over dates and I want to see if for example they choose another product over time. Ideally, i'd like to copy the two columns where the changes occurred into a new column.

So, if I had a table like

period, Customer , product
2020-01, Cust1, 12 TS
2020-02, Cust1, 12 TS
2020-03, Cust1, 14 SLM
2020-01, Cust2, 12 SLM
2020-02, Cust2, 12 TS
2020-03, Cust2, 14 SLM

So cust1 went over time from TS to SLM, whereas Cust2 went from SLM to TS then the opposite. The final column should look like:

period, Customer , product , change
2020-01, Cust1, 12 TS , NAN
2020-02, Cust1, 12 TS , NAN
2020-03, Cust1, 14 SLM, from TS to SLM
2020-01, Cust2, 12 SLM, NAN
2020-02, Cust2, 12 TS, from SLM to TS
2020-03, Cust2, 14 SLM, from TS to SLM

I have look in many solutions avaliable like here, but I couldn't manage to do it the way I wanted.

Sam Al-Ghammari
  • 1,021
  • 7
  • 23

2 Answers2

1

We can do this in a number of ways, I would suggest using shift and groupby to find the max record and then .loc to filter your query set appropriately.

setup.

from io import StringIO
import pandas as pd


d = """period, Customer, quantity , product
2020-01, Cust1, 12, TS
2020-02, Cust1, 12, TS
2020-03, Cust1, 14, SLM
2020-01, Cust2, 12, SLM
2020-02, Cust2, 12, TS
2020-03, Cust2, 14, SLM"""

df = pd.read_csv(StringIO(d),sep=',',parse_dates=['period'])
# as you have spaces in your csv above.
#df.columns = df.columns.str.strip()

#create a record end date. 
df['period_end_date'] = df.groupby('Customer')['period'].shift(-1)
#find the previous product. 
df.loc[df['period_end_date'].isna(), 
                             'previous_product'] = df.groupby('Customer')['product'].shift(1)

the current record here will be where the preiod_end_date is null.

print(df)

      period Customer  quantity product period_end_date previous_product
0 2020-01-01    Cust1        12      TS      2020-02-01              NaN
1 2020-02-01    Cust1        12      TS      2020-03-01              NaN
2 2020-03-01    Cust1        14     SLM             NaT               TS
3 2020-01-01    Cust2        12     SLM      2020-02-01              NaN
4 2020-02-01    Cust2        12      TS      2020-03-01              NaN
5 2020-03-01    Cust2        14     SLM             NaT               TS

if you need it in a pre-defined format as you've outlined above.

df.loc[df['period_end_date'].isna(), 
                            'previous_product'] = ("FROM " 
                            + df.groupby('Customer')['product'].shift(1) 
                            + " TO " 
                            + df['product'] )

      period Customer  quantity product period_end_date  previous_product
0 2020-01-01    Cust1        12      TS      2020-02-01               NaN
1 2020-02-01    Cust1        12      TS      2020-03-01               NaN
2 2020-03-01    Cust1        14     SLM             NaT  FROM  TS TO  SLM
3 2020-01-01    Cust2        12     SLM      2020-02-01               NaN
4 2020-02-01    Cust2        12      TS      2020-03-01               NaN
5 2020-03-01    Cust2        14     SLM             NaT  FROM  TS TO  SLM
Umar.H
  • 22,559
  • 7
  • 39
  • 74
0

We can first group the dataframe by Customer, then shift to check if there is a change. After that we can compare and determine the change.

df['prev_product'] = df.groupby(['Customer'])['product'].shift().bfill()

df['change'] = df[['product', 'prev_product']].apply(lambda x: None if(x[0] == x[1]) else f'from {x[1]}  to {x[0]}', axis=1)
    period  Customer    n   product prev_product    change
0   2020-01 Cust1       12  TS      TS              None
1   2020-02 Cust1       12  TS      TS              None
2   2020-03 Cust1       14  SLM     TS              from TS to SLM
3   2020-01 Cust2       12  SLM     SLM             None
4   2020-02 Cust2       12  TS      SLM             from SLM to TS
5   2020-03 Cust2       14  SLM     TS              from TS to SLM
​

NOTE: df.drop('prev_product',axis=1) incase not required.

Epsi95
  • 8,832
  • 1
  • 16
  • 34