1

I'm trying to build a model for timeseries analysis and I want to create some features which contain values of previous rows.

I have a dataset as follows:

item_id shop_id items_sold date_block_id
1 1 5 0
1 2 10 0
1 1 7 1
2 2 8 1
2 1 5 1

And I want to add a column containing the amount of items sold for each item/shop combination, from the previous date block. So something like:

item_id shop_id items_sold date_block_id lag1_items_sold
1 1 5 0 0
1 2 10 0 0
1 1 7 1 5
2 2 8 1 0
2 1 5 1 0

I think what makes this hard is the fact that if no sale is made within a date block, no entry of that item/shop combination is present in the table. I have tried al sorts of combinations of grouping and shifting, but I can't seem to get the result I want.

Henkie
  • 9
  • 3

2 Answers2

0

Look at df.shift combined with group by. Docs for shift here.

Older answer with code here. Try searching through SO next time and you might find the answer!

piedpiper
  • 328
  • 2
  • 13
0

Try:

df["lag1_items_sold"] = (
    df.groupby(["item_id", "shop_id"])["items_sold"].shift().fillna(0)
)

print(df)

Prints:

   item_id  shop_id  items_sold  date_block_id  lag1_items_sold
0        1        1           5              0              0.0
1        1        2          10              0              0.0
2        1        1           7              1              5.0
3        2        2           8              1              0.0
4        2        1           5              1              0.0
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91