I would like to show an example of a model that overfit a test set and does not generalize well on future data.
I split the news dataset in 3 sets:
train set length: 11314
test set length: 5500
future set length: 2031
I am using a text dataset and build a CountVectorizer
.
I am creating a grid search (without cross-validation), each loop will test some parameters on the vectorizer ('min_df','max_df') and some parameter on my model LogisticRegression
('C', 'fit_intercept', 'tol', ...).
The best result I get is:
({'binary': False, 'max_df': 1.0, 'min_df': 1},
{'C': 0.1, 'fit_intercept': True, 'tol': 0.0001},
test set score: 0.64018181818181819,
training set score: 0.92902598550468451)
but now if I run it on the future set I will get a score similar to the test set:
clf.score(X_future, y_future): 0.6509108813392418
How can I demonstrate a case where I overfitted my test set so it does not generalize well to future data?