0

I am running quite a large parameter search using TuneGridSearchCV on an xgboost model using my university's HPC cluster. The results are being saved to ~/ray_results however I don't have enough space to save all the files to the home directory as per the HPC policy. How can I move ray_results to a different folder that has more space? I've looked into the the documentation but I am confused about how to do it.

My code is as follows:

import numpy as np
import pandas as pd
from pandas import MultiIndex, Int16Dtype
from sklearnex import patch_sklearn  
patch_sklearn()

import xgboost as xgb
    
from tune_sklearn import TuneGridSearchCV
   
from datetime import datetime
import sys


if __name__ == "__main__":

        df_train = pd.read_excel('my_dataset.xlsx')
    
        train_cols = df_train.columns[df_train.columns != 'Response']


        X_train = pd.DataFrame(df_train, columns=train_cols)
        y_train = pd.DataFrame(df_train, columns=['Response'])


        params =  {
                "n_estimators"  : list(range(100, 1400, 100)),
                "max_depth"        : list(range(2, 20, 2)), 
                "min_child_weight" : list(range(2, 20, 2)), 
                "gamma"            : np.arange(0, 1.05, 0.1), 
                "colsample_bytree" : np.arange(0.5, 1.05, 0.1), 
                "colsample_bylevel" : np.arange(0.5, 1.05, 0.1), 
                'reg_lambda': [0.1, 1.0, 5.0, 10.0, 25.0, 50.0]
                }

        xgb_model = xgb.XGBClassifier(seed=0, use_label_encoder = False, tree_method = 'hist')
        print(params)


        grid_cv = TuneGridSearchCV(xgb_model, param_grid = params, cv = 5, n_jobs = -1, scoring='roc_auc')

        current_time = datetime.now().strftime("%H:%M:%S")
        print("Start Time =", current_time)
        print('\n')


        grid_cv.fit(X_train, y_train.values.ravel())


        current_time = datetime.now().strftime("%H:%M:%S")
        print('End Time: ', current_time)
        print('\n\n')

        print('Grid best score (roc_auc): ')
        print(grid_cv.best_score_)
        print('\n\n')
        print('Grid best hyperparameters: ')
        print(grid_cv.best_params_)
        print('\n\n')

Alternatively, instead of creating a folder for every single parameter combination (which is what it is doing), is there a way to change the format of the output to be more space efficient?

shoopdoop
  • 1
  • 1

1 Answers1

0

You should be able to set this with TuneGridSearchCV(local_dir="YOUR_PATH").

richliaw
  • 1,925
  • 16
  • 14