1

I have classification problem to solve and use different classificators to solve the task. I use cross_val_score and cross_val_predict for validation and prediction. Both of them and estimator, e.g. LGBMClassifier support parallelizing. I have 46 physical cores, each of which has 2 logical cores, so 92 total. How should I set up n_jobs parameter in all functions to achieve best perfomance?

from lightgbm import LGBMClassifier
from sklearn.model_selection import cross_val_score, cross_val_predict

# X, y = predefined data
model = LGBMClassifier(n_estimators = 100, tree_learner='feature', n_jobs = ???)

score = cross_val_score(model, X, y, cv=5, n_jobs = ???)

My guess is that n_jobs of estimator should depend on parallelization technique, e.g. for feature case it should be equal to feature number. And as for validation, it probably should depend on number of folds. But it is only a guess. Is there sertified answer?

Nourless
  • 729
  • 1
  • 5
  • 18

0 Answers0