Saving multiple trained models with sklearn

Question

I'm trying to save multiple trained models for binary classification using pickle. I'm saving them in a pickle file as a tuple with the name of the model (name, model). However, when I load them and try to use the models all the predictions are zero. This is not the case if I train and predict all at once. Here is the part of the code where I'm saving the models:

    else:
        # Simple pipeline
        if dimension_reduction is None:
            model = Pipeline(steps=[('ss', scaler), ('clf', clf)])
        else:
            model = Pipeline(steps=[('ss', scaler), ('dr', dimension_reduction), ('clf', clf)])

        # Training the model  
        model.fit(X_train, y_train) 
        
        # Saving into a list
        models.append((name,model))

# Saving into a pickle file
with open("sklearn_models_{}.pckl".format(models_name), "wb") as f:
    for model in models:
        pickle.dump(model, f)
print('Models saved')

I load them using

models = []
with open("sklearn_models_{}.pckl".format(NAME_MODELS), "rb") as f:
    while True:
        try:
            models.append(pickle.load(f))
        except EOFError:
            break

The complete code is in this link. The training bit I put here is lines 166 to 183 and the loading bit lines 361 to 368.

It's there a better way to do this? What am I doing wrong?

So, if you retrieve a `model` from `models` before pickling/dumping, prediction works fine? — rickhg12hs, Mar 22 '22 at 02:37
Yes. Actually, after pickling/dumping too. Because I do training-->pickling-->dumping and that way works fine. It's when I skip the training and pickling steps, to just use the already trained and saved models, that it does not work. — Mariana Vivas, Mar 22 '22 at 19:15
I haven't run your full code, but when I `pickle.dump`/`pickle.load` tuples like yours, everything works. My `model.predict` work. If you `pickle.load` and then `model.predict(X_train)` do you get the same results? I.e., predicting previous results are the same? — rickhg12hs, Mar 23 '22 at 14:16
No, when I load `models` and try to predict on the exact same dataset I get that all predictions are zero. — Mariana Vivas, Mar 23 '22 at 16:46
It won't be fun, but if you modify your code to store all the models in a string with `pickle.dumps(...` and then later `pickle.loads(...`, do the models then `predict` as expected? I'm wondering if there is a `pickle` problem or perhaps some disk save/load problem. — rickhg12hs, Mar 23 '22 at 22:57
I solved it. I'm using a tensorflow model too. For some reason, if you load the tensorflow model last it ruins the predictions for the sklearn models. — Mariana Vivas, Mar 24 '22 at 15:55

Saving multiple trained models with sklearn

0 Answers0