Pandas shift that takes into account groups

Question

I have chronological data (monthly aggregation per customer).

df=pd.DataFrame({'cust_id': [1,1,1,1,1,1,2,2,2,2,2],
                 'period' : [200010,200011,200012,200101,200102,200103,200010,200011,200012,200101,200103],
                 'volume' : [1,2,3,4,5,6,7,8,9,10,12],
                 'num_transactions': [3,4,5,6,7,8,9,10,11,12,13],
                 'label': [1,1,1,0,1,1,0,0,0,0,0]})

The dataframe is sorted out by user and month, ascending.

There is a column "label" which is, essentially, a categorical variable.

I want to introduce a column "next_month_label" where I store the label value for the next month for that user.

I used shift and then I realised that it does not consider the fact that the data for customer1 is then followed by that of customer2. So, essentially, the last row for customer1 is "borrowing" the label of the first row of customer2. Instead, the field "next_month_label" for the last row of customer1 should stay empty / null.

How to do that?

The expected result should look like this:

df=pd.DataFrame({'cust_id': [1,1,1,1,1,1,2,2,2,2,2],
                 'period' : [200010,200011,200012,200101,200102,200103,200010,200011,200012,200101,200103],
                 'volume' : [1,2,3,4,5,6,7,8,9,10,12],
                 'num_transactions': [3,4,5,6,7,8,9,10,11,12,13],
                 'label': [1,1,1,0,1,1,0,0,0,0,0],
                 'next_month_label': [1,1,0,1,1,NaN,0,0,0,0,NaN],
})

score 1 · Accepted Answer · answered Jul 20 '23 at 10:37

Let me know if this code gives you required result:

df=pd.DataFrame({'cust_id': [1,1,1,1,1,1,2,2,2,2,2],
                 'period' : [200010,200011,200012,200101,200102,200103,200010,200011,200012,200101,200103],
                 'volume' : [1,2,3,4,5,6,7,8,9,10,12],
                 'num_transactions': [3,4,5,6,7,8,9,10,11,12,13],
                 'label': [1,1,1,0,1,1,0,0,0,0,0]})

df['next_month_label'] = df.groupby('cust_id')['label'].shift(-1)

print(df)

 cust_id  period  volume  num_transactions  label  next_month_label
0         1  200010       1                 3      1               1.0
1         1  200011       2                 4      1               1.0
2         1  200012       3                 5      1               0.0
3         1  200101       4                 6      0               1.0
4         1  200102       5                 7      1               1.0
5         1  200103       6                 8      1               NaN
6         2  200010       7                 9      0               0.0
7         2  200011       8                10      0               0.0
8         2  200012       9                11      0               0.0
9         2  200101      10                12      0               0.0
10        2  200103      12                13      0               NaN

If this is what is needed, this is already well described [here](https://stackoverflow.com/questions/53335567/use-pandas-shift-within-a-group) — mozway, Jul 20 '23 at 10:38

Pandas shift that takes into account groups

1 Answers1