I am running quite a large parameter search using TuneGridSearchCV on an xgboost model using my university's HPC cluster. The results are being saved to ~/ray_results however I don't have enough space to save all the files to the home directory as per the HPC policy. How can I move ray_results to a different folder that has more space? I've looked into the the documentation but I am confused about how to do it.
My code is as follows:
import numpy as np
import pandas as pd
from pandas import MultiIndex, Int16Dtype
from sklearnex import patch_sklearn
patch_sklearn()
import xgboost as xgb
from tune_sklearn import TuneGridSearchCV
from datetime import datetime
import sys
if __name__ == "__main__":
df_train = pd.read_excel('my_dataset.xlsx')
train_cols = df_train.columns[df_train.columns != 'Response']
X_train = pd.DataFrame(df_train, columns=train_cols)
y_train = pd.DataFrame(df_train, columns=['Response'])
params = {
"n_estimators" : list(range(100, 1400, 100)),
"max_depth" : list(range(2, 20, 2)),
"min_child_weight" : list(range(2, 20, 2)),
"gamma" : np.arange(0, 1.05, 0.1),
"colsample_bytree" : np.arange(0.5, 1.05, 0.1),
"colsample_bylevel" : np.arange(0.5, 1.05, 0.1),
'reg_lambda': [0.1, 1.0, 5.0, 10.0, 25.0, 50.0]
}
xgb_model = xgb.XGBClassifier(seed=0, use_label_encoder = False, tree_method = 'hist')
print(params)
grid_cv = TuneGridSearchCV(xgb_model, param_grid = params, cv = 5, n_jobs = -1, scoring='roc_auc')
current_time = datetime.now().strftime("%H:%M:%S")
print("Start Time =", current_time)
print('\n')
grid_cv.fit(X_train, y_train.values.ravel())
current_time = datetime.now().strftime("%H:%M:%S")
print('End Time: ', current_time)
print('\n\n')
print('Grid best score (roc_auc): ')
print(grid_cv.best_score_)
print('\n\n')
print('Grid best hyperparameters: ')
print(grid_cv.best_params_)
print('\n\n')
Alternatively, instead of creating a folder for every single parameter combination (which is what it is doing), is there a way to change the format of the output to be more space efficient?