I'm trying to build a prediction model in auto-sklearn with 10 fold cross validation. My dataset has about 40k rows and 80 features. Here is my code (where X are my features and y is the continuous outcome variable):
automl = autosklearn.regression.AutoSklearnRegressor(
time_left_for_this_task=3600, per_run_time_limit=600,
resampling_strategy='cv',
resampling_strategy_arguments={'folds': 10})
automl.fit(X, y, dataset_name='unused', feat_type=feature_types)
automl.refit(X.copy(), y.copy())
automl.cv_results_
The output from the last line is little confusing to me
{'mean_fit_time': array([6.00111840e+02, 1.76325102e+01, 1.68442428e+01,
1.68408656e+00,
9.08970833e-01, 1.73636928e+01, 5.83850384e-01, 8.99704933e-01,
1.77676334e+01, 8.56771708e-01, 1.58957437e+02, 6.00050516e+02,
6.00073232e+02, 1.72906122e+01, 6.00116965e+02, 6.00113743e+02,
3.24114606e+02]),
'mean_test_score': array([0. , 0.2108587, 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.2108587, 0. ,
0. , 0.2108587, 0. , 0. , 0. ]),
[results text is longer but I've deleted it due to character limits]
'rank_test_scores': array([4, 1, 4, 4, 4, 4, 4, 4, 4, 4, 1, 4, 4, 1, 4, 4, 4]),
'status': ['Timeout', 'Success', 'Memout', 'Crash', 'Memout', 'Memout', 'Crash', 'Crash', 'Crash', 'Memout', 'Success', 'Timeout', 'Crash', 'Success', 'Timeout', 'Timeout', 'Timeout']}
There is no mean_train_score and it seems that there are a lot of missings in mean_test_score. Am I doing something wrong? I get the same issue when I allow my model to run for longer. I also get a worse R2 when I run 10-fold cross validation than when I don't
Any guidance would be appreciated. Yara.