How do I run multiple jobs in a xgboost model, when I combine multiple scripts to call different functions?

Question

I have a library of different functions for example my xgboost model.

I am doing a forecast analysis and thus have a different script where I import the functions to forecast.

I am doing cross validation on the model for a grid of parameter values, and need more computional power or else the function will take days to finish.

The code runs when i set n_jobs = 1, but sends the error when i set n_jobs = CPU_cores - 1. : BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

This is the code from the library.py script: def xgb_pprediction(ydf, Xdf, Prediction_formation):

"""
ydf = Inflation
Xdf = explanatory variables Fred-MD data
Prediction formation = is the set of lags and time until prediction
                        making sure that I only use real-time data
"""


lag, delta = prediction_formation_info(Prediction_formation)
 
Xdf_norm = (Xdf - Xdf.mean()) / Xdf.std()
ylag = ydf.shift(periods = (lag))
ylag_norm = (ylag - ylag.mean()) / ylag.std()
ylag_norm.columns = ['AR']

df = pd.concat([ydf, ylag_norm, Xdf_norm], axis=1).dropna()
dfTrain = df.iloc[:-(lag)]
dfNew = df.iloc[[-1]]


# Model configuration is decides either the amount of estimators to be 
# chosen or the grid of parameters to be estimated
model_config = {
                "n_estimators": 400,
               
                "grid": {
                    "eta": [0.1, 0.2, 0.3,],
                    "max_depth": [1, 2, 3, 4],
                    "subsample": [0.5, 0.8, 1],
                    "colsample_bytree": [0.7, 1],
                    "min_child_weight": [1, 3]
                    }
                    }


model = xgb.XGBRegressor(n_estimators=model_config["n_estimators"],
                         n_jobs=6, random_state=123)

tss = TimeSeriesSplit(n_splits=5, gap=lag, test_size=1)

model = GridSearchCV(estimator=model, param_grid=model_config["grid"], cv=tss, verbose=100, n_jobs=6,
                         scoring="neg_mean_squared_error", refit=True)

model.fit(dfTrain.iloc[:,1:],  dfTrain.iloc[:,0])

best_model= model.best_params_  
print(model.best_params_)


final_model = xgb.XGBRegressor(n_estimators=model_config["n_estimators"],
                              random_state=123,
                              n_jobs=6, **best_model).fit(dfTrain.iloc[:,1:], dfTrain.iloc[:,0])
                                  
yhat = final_model.predict(dfNew.iloc[:,1:])

eps = yhat - dfNew.iloc[0,0]

return [yhat, eps]

How do I run multiple jobs in a xgboost model, when I combine multiple scripts to call different functions?

0 Answers0