I'm trying to build a model for timeseries analysis and I want to create some features which contain values of previous rows.
I have a dataset as follows:
item_id | shop_id | items_sold | date_block_id |
---|---|---|---|
1 | 1 | 5 | 0 |
1 | 2 | 10 | 0 |
1 | 1 | 7 | 1 |
2 | 2 | 8 | 1 |
2 | 1 | 5 | 1 |
And I want to add a column containing the amount of items sold for each item/shop combination, from the previous date block. So something like:
item_id | shop_id | items_sold | date_block_id | lag1_items_sold |
---|---|---|---|---|
1 | 1 | 5 | 0 | 0 |
1 | 2 | 10 | 0 | 0 |
1 | 1 | 7 | 1 | 5 |
2 | 2 | 8 | 1 | 0 |
2 | 1 | 5 | 1 | 0 |
I think what makes this hard is the fact that if no sale is made within a date block, no entry of that item/shop combination is present in the table. I have tried al sorts of combinations of grouping and shifting, but I can't seem to get the result I want.