Python Pandas - Shift grouped values to next date

Question

I'm trying to build a model for timeseries analysis and I want to create some features which contain values of previous rows.

I have a dataset as follows:

item_id	shop_id	items_sold	date_block_id
1	1	5	0
1	2	10	0
1	1	7	1
2	2	8	1
2	1	5	1

And I want to add a column containing the amount of items sold for each item/shop combination, from the previous date block. So something like:

item_id	shop_id	items_sold	date_block_id	lag1_items_sold
1	1	5	0	0
1	2	10	0	0
1	1	7	1	5
2	2	8	1	0
2	1	5	1	0

I think what makes this hard is the fact that if no sale is made within a date block, no entry of that item/shop combination is present in the table. I have tried al sorts of combinations of grouping and shifting, but I can't seem to get the result I want.

score 0 · Answer 1 · answered Dec 07 '22 at 15:55

0

Look at df.shift combined with group by. Docs for shift here.

Older answer with code here. Try searching through SO next time and you might find the answer!

answered Dec 07 '22 at 15:55

piedpiper

328
2
13

score 0 · Answer 2 · answered Dec 07 '22 at 18:56

Try:

df["lag1_items_sold"] = (
    df.groupby(["item_id", "shop_id"])["items_sold"].shift().fillna(0)
)

print(df)

Prints:

   item_id  shop_id  items_sold  date_block_id  lag1_items_sold
0        1        1           5              0              0.0
1        1        2          10              0              0.0
2        1        1           7              1              5.0
3        2        2           8              1              0.0
4        2        1           5              1              0.0

Python Pandas - Shift grouped values to next date

2 Answers2