1

I came across this question while on a sklearn ML case with heavily imbalanced data. The line below provides the basis for assessing the model from confusion metrics and precision-recall perspectives but ... it is a train/predict combined method:

y_pred = model_selection.cross_val_predict(model, X, Y, cv=kfold)

The question is how do I leverage this 'cross-val-trained' model to:

1) predict on another data set (scaled) instead of having to train/predict each time?

2) export/serialize/deploy the model to predict on live data?

model.predict() #--> nope.  need a fit() first

model.fit() #--> nope.  a different model which does not take advantage of the cross_val_xxx methods

Any help is appreciated.

1 Answers1

1

You can fit a new model with the data.

The cross validation aspect is about validating the way the model is built, not the model itself. So if the cross validation is OK, then you can train a new model with all the data.

(See my response here as well for more details Fitting sklearn GridSearchCV model)

Matthieu Brucher
  • 21,634
  • 7
  • 38
  • 62
  • I am using bagging and boosting algorithms to avoid the bias toward the majority class due to heavy imbalance. However these algorithms require call to model_selection.cross_val_xxx method in order to score or predict. If I do a model.fit() it is a single pass and does not address the bias issue - let me know if I understand it incorrectly? – user10735321 Dec 02 '18 at 21:57
  • Thank you Matthieu, I got you point. Model is what it is based on the parameters provided. Cross validation does not add value to the model other than just scorning it. – user10735321 Dec 05 '18 at 22:56
  • Sorry, didn't see your message earlier. I think you got it, yes. – Matthieu Brucher Dec 05 '18 at 23:14