I have a library of different functions for example my xgboost model.
I am doing a forecast analysis and thus have a different script where I import the functions to forecast.
I am doing cross validation on the model for a grid of parameter values, and need more computional power or else the function will take days to finish.
The code runs when i set n_jobs = 1, but sends the error when i set n_jobs = CPU_cores - 1. : BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
This is the code from the library.py script: def xgb_pprediction(ydf, Xdf, Prediction_formation):
"""
ydf = Inflation
Xdf = explanatory variables Fred-MD data
Prediction formation = is the set of lags and time until prediction
making sure that I only use real-time data
"""
lag, delta = prediction_formation_info(Prediction_formation)
Xdf_norm = (Xdf - Xdf.mean()) / Xdf.std()
ylag = ydf.shift(periods = (lag))
ylag_norm = (ylag - ylag.mean()) / ylag.std()
ylag_norm.columns = ['AR']
df = pd.concat([ydf, ylag_norm, Xdf_norm], axis=1).dropna()
dfTrain = df.iloc[:-(lag)]
dfNew = df.iloc[[-1]]
# Model configuration is decides either the amount of estimators to be
# chosen or the grid of parameters to be estimated
model_config = {
"n_estimators": 400,
"grid": {
"eta": [0.1, 0.2, 0.3,],
"max_depth": [1, 2, 3, 4],
"subsample": [0.5, 0.8, 1],
"colsample_bytree": [0.7, 1],
"min_child_weight": [1, 3]
}
}
model = xgb.XGBRegressor(n_estimators=model_config["n_estimators"],
n_jobs=6, random_state=123)
tss = TimeSeriesSplit(n_splits=5, gap=lag, test_size=1)
model = GridSearchCV(estimator=model, param_grid=model_config["grid"], cv=tss, verbose=100, n_jobs=6,
scoring="neg_mean_squared_error", refit=True)
model.fit(dfTrain.iloc[:,1:], dfTrain.iloc[:,0])
best_model= model.best_params_
print(model.best_params_)
final_model = xgb.XGBRegressor(n_estimators=model_config["n_estimators"],
random_state=123,
n_jobs=6, **best_model).fit(dfTrain.iloc[:,1:], dfTrain.iloc[:,0])
yhat = final_model.predict(dfNew.iloc[:,1:])
eps = yhat - dfNew.iloc[0,0]
return [yhat, eps]