Best Way to Perform TimeSeries Cross Validation with irregular Dates of Observations and uneven Observations per Date?
I have a dataset that I have been trying to utilize for XGBoost Regression. The problem I am encountering is how best to apply TimeSeries Cross Validation (or Group Time Series Cross Validation) for my train and test sets.
My dataset includes the target variable, the date of observation, and then feature values for the date of observation of the target variable. Each date of observation has an average of 5 target observations per day, however, there are dates were 4 or 10 observations were recorded. Regardless, most dates have 5 observations recorded.
I have found this question/answer which I think can work for my use-case, however, it would require me to trim down the target variable's observations on days where observations are greater than 4, so that each date has exactly 4 target observations.
Split time series with multiple records per day
Is there an appropriate method in determining which observations to remove, so that I can have all observation dates having exactly 4 observations? Or, if possible, determine a way to not remove observations and perform GroupTimeSeries Cross Validation on the entire dataset?
I cant do a random split, so I split my dataset into train/test based on a specific date index (70/30 split).
This is an example of my dataframe
Target Feature dayofmonth weekofyear
Obs. Date ...
2008-06-16 140.2 25 ... 16
2008-06-16 140.7 25 ... 16
2008-06-16 139.0 25 ... 16
2008-06-16 144.5 25 ... 16
2018-09-04 64.9 36 ... 4
2018-09-04 72.9 36 ... 4
2018-09-04 75.6 36 ... 4
2018-09-04 71.6 36 ... 4
2018-09-04 74.9 36 ... 4
[618 rows x 46 columns]