The exported pipeline of TPOT
stating that the Average CV score on the training set was: -128.90187963562252
(neg_MAE).
However, refitting the pipeline with the same exact training set yields way smaller MAE around (35).
Moreover predicting unseen test set would yield an MAE around (140) which is in line with what the exported pipeline stating.
I am a bit confused and wondering how to reproduce the error score on the training set.
The pipeline seems to be overfitting right??
cv = RepeatedKFold(n_splits=4, n_repeats=1, random_state=1)
model = TPOTRegressor(generations=10, population_size=25, offspring_size=None, mutation_rate=0.9,
crossover_rate=0.1, scoring='neg_mean_absolute_error', cv=cv,
subsample=0.75,n_jobs=-1, max_time_mins=None,
max_eval_time_mins=5,random_state=42,config_dict=None, template=None,
warm_start=False, memory=None,
use_dask=False,periodic_checkpoint_folder=None, early_stop=3, verbosity=2,
disable_update_check=False, log_file=None)
model.fit(train_df[x], train_df[y])
# The Exported model
# Average CV score on the training set was: -128.90187963562252
exported_pipeline = make_pipeline(StackingEstimator(estimator=LassoLarsCV(normalize=True)),
StackingEstimator(estimator=ExtraTreesRegressor(bootstrap=True,
max_features=0.4, min_samples_leaf=1,
min_samples_spli`enter code here`t=7, n_estimators=100)),
PolynomialFeatures(degree=2, include_bias=False,
interaction_only=False),
ExtraTreesRegressor(bootstrap=True,
max_features=0.15000000000000002, min_samples_leaf=9,
min_samples_split=7,n_estimators=100))
# Fix random state for all the steps in exported pipeline
set_param_recursive(exported_pipeline.steps, 'random_state', 42)
exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)
Thanks in advance