I have classification problem to solve and use different classificators to solve the task. I use cross_val_score
and cross_val_predict
for validation and prediction. Both of them and estimator, e.g. LGBMClassifier
support parallelizing. I have 46 physical cores, each of which has 2 logical cores, so 92 total. How should I set up n_jobs
parameter in all functions to achieve best perfomance?
from lightgbm import LGBMClassifier
from sklearn.model_selection import cross_val_score, cross_val_predict
# X, y = predefined data
model = LGBMClassifier(n_estimators = 100, tree_learner='feature', n_jobs = ???)
score = cross_val_score(model, X, y, cv=5, n_jobs = ???)
My guess is that n_jobs
of estimator should depend on parallelization technique, e.g. for feature
case it should be equal to feature number. And as for validation, it probably should depend on number of folds. But it is only a guess. Is there sertified answer?