For a day ahead basis prediction model evaluation. For my train test split, do i do an 80:20 split or do a (rest of the days : last day) split?

Question

I have time series data for 3 months, in 15 minute intervals. (one day has 96 time slots) I have Temperature column[Temp] and Solar irradiance[SI](sun intensity) column. My model has to predict temperature on a 'day-ahead' basis for the entire day. ie I have to predict 96 time slots given data upto the previous day. When Im evaluating my model 'by myself' and splitting my data into train and test sets. How do i split them? Do i do an 80:20 split? but my test data will have more than one day's readings. Or do i do a (3 months - 1 day) --> as train, and test only on the last day?

score 1 · Answer 1 · answered Feb 02 '19 at 08:38

Actually, that depends on your task. But it is highly recommended not to mix old/new data in the train set.

There are several links that you may find useful:

http://francescopochetti.com/pythonic-cross-validation-time-series-pandas-scikit-learn/

https://stats.stackexchange.com/questions/117350/how-to-split-dataset-for-time-series-prediction

https://stats.stackexchange.com/questions/346907/splitting-time-series-data-into-train-test-validation-sets

For a day ahead basis prediction model evaluation. For my train test split, do i do an 80:20 split or do a (rest of the days : last day) split?

1 Answers1