I have chronological data (monthly aggregation per customer).
df=pd.DataFrame({'cust_id': [1,1,1,1,1,1,2,2,2,2,2],
'period' : [200010,200011,200012,200101,200102,200103,200010,200011,200012,200101,200103],
'volume' : [1,2,3,4,5,6,7,8,9,10,12],
'num_transactions': [3,4,5,6,7,8,9,10,11,12,13],
'label': [1,1,1,0,1,1,0,0,0,0,0]})
The dataframe is sorted out by user and month, ascending.
There is a column "label" which is, essentially, a categorical variable.
I want to introduce a column "next_month_label" where I store the label value for the next month for that user.
I used shift and then I realised that it does not consider the fact that the data for customer1 is then followed by that of customer2. So, essentially, the last row for customer1 is "borrowing" the label of the first row of customer2. Instead, the field "next_month_label" for the last row of customer1 should stay empty / null.
How to do that?
The expected result should look like this:
df=pd.DataFrame({'cust_id': [1,1,1,1,1,1,2,2,2,2,2],
'period' : [200010,200011,200012,200101,200102,200103,200010,200011,200012,200101,200103],
'volume' : [1,2,3,4,5,6,7,8,9,10,12],
'num_transactions': [3,4,5,6,7,8,9,10,11,12,13],
'label': [1,1,1,0,1,1,0,0,0,0,0],
'next_month_label': [1,1,0,1,1,NaN,0,0,0,0,NaN],
})