I trained a XGBClassifier for my classification problem and did Hyper-parameter tuning over huge grid(probably tuned every possible parameter) using optuna. While testing, change of random_state changes model performance metrics (roc_auc/recall/precision), feature_importance and even model predictions (predict_prob).
- What does this tell me about my data?
Since I have to take this model in production, how should I tackle this for model to be more robust?
- Stay with one random_state (say default 0) which we use during cross_validation and use it on out-of-sample as well.
- During cross_validation, on top of each param_combination, run few random_state(say 10) and take avg model performance.