I'm trying to run Autosklearn with a custom cross validation strategy. I implemented it to be used in sklearn initially, it works, but now I wanted to give Autosklean a try.
My cross validation strategy is a simple function that yields train/test index, taking as input the number of splits to be used. The goal is that each validation split contains data for the last year, and training data comes from all the previous data up to the years:
def cross_validate_temporal(df, n=3, seed=None):
# asume df is sorted by date; df['date'] is a string of dates as 'yyyy-mm-dd'
assert n >1
np.random.seed(seed)
# The latest I can use is 2018
years = [str(2018 - i) for i in range(n, 0, -1)]
for year in years:
val_idx = np.argwhere((df['date'].apply(lambda x: x[:4]==year)).values).reshape(-1)
train_idx = np.arange(np.min(val_idx))
np.random.shuffle(train_idx)
yield train_idx, val_idx
Actually, the logic to make the split is a little more complex, so I can't use sklearn's TimeSeriesSplit
out of the box.
So, the question is, how do I use this custom strategy with autosklearn?