I have a data set made of several months (from JAN-15 do SEPT-17), reporting a customer financial situation for each month. My task it to predict the cumulative sales for each customer for the next 12 months.
My dataset looks like this (this is the raw data, for training I will create lagged features)
Month CustomerID NetSales
JAN-15 A 10
JAN-15 B 10
JAN-15 C 10
FEB-15 A 10
FEB-15 B 10
FEB-15 C 10
...
How can I split in TRAIN / VAL / TEST it with consistency to time? Can I do something like this?
- TRAIN --> all customer / months from JAN-15 to MAR-16 (I take each month at least once so the model will learn seasonal patterns
- VAL --> all customer / months from APR-16 to JUN-16
- TEST --> all customer / months from JUL-16 to SEP-16 (I stop here because I neeed the followin 12 months to create the target variable)
Is this a consistent split strategy? In alternative, what would you advice?
Thanks a lot, Andrea