Yes, you can perform a RandomizedSearchCV
without using cross-validation and instead use a simple train/test split for parameter tuning.
To achieve this, you can utilize the ShuffleSplit
class within the sklearn.model_selection
module to create a singular train/test division designated for your parameter search. Here's the method to implement this by
integrating just one of the following lines of code:
from sklearn.model_selection import ShuffleSplit
my_cv = ShuffleSplit(n_splits=1)
my_cv = ShuffleSplit(n_splits=1, test_size=0.33, random_state=0)
The first option generates a randomized separation between the training and testing sets, while the second allows you to specify the size of the test set by indicating the desired test size.
Subsequently, you can configure the cv
parameter within the RandomizedSearchCV
function by setting it to cv=my_cv
.
Additionally, it's essential to emphasize that
in this scenario, the RandomizedSearchCV
automatically manages the train/test split for you. As a result, it's important to utilize your complete dataset. Instead of employing the (X_train, y_train)
, you should employ (features, target)
to feed into the random search process. This ensures that the RandomizedSearchCV
effectively handles the data partitioning internally.
Here'sis how you can modify your code:
pipe = Pipeline(steps=[('gbm', GradientBoostingClassifier())])
my_cv = ShuffleSplit(n_splits=1, test_size=0.33, random_state=0) # <==========
param_dist = dict(gbm__max_depth=[3,6,10],
gbm__n_estimators=[50,100,500,1000],
gbm__min_samples_split=[2,5,8,11],
gbm__learning_rate=[0.01,0.05,0.1,0.5,1.0],
gbm__max_features=['sqrt', 'log2']
)
grid_search = RandomizedSearchCV(pipe, param_distributions=param_dist,cv=my_cv)
grid_search.fit(features, **target**) # <==========
In this code, cv
represents the train/test split created using ShuffleSplit
, and you can customize test_size
and other parameters based on your preference.
Hope this helps!