0

I'm trying to save multiple trained models for binary classification using pickle. I'm saving them in a pickle file as a tuple with the name of the model (name, model). However, when I load them and try to use the models all the predictions are zero. This is not the case if I train and predict all at once. Here is the part of the code where I'm saving the models:

    else:
        # Simple pipeline
        if dimension_reduction is None:
            model = Pipeline(steps=[('ss', scaler), ('clf', clf)])
        else:
            model = Pipeline(steps=[('ss', scaler), ('dr', dimension_reduction), ('clf', clf)])

        # Training the model  
        model.fit(X_train, y_train) 
        
        # Saving into a list
        models.append((name,model))

# Saving into a pickle file
with open("sklearn_models_{}.pckl".format(models_name), "wb") as f:
    for model in models:
        pickle.dump(model, f)
print('Models saved') 

I load them using

models = []
with open("sklearn_models_{}.pckl".format(NAME_MODELS), "rb") as f:
    while True:
        try:
            models.append(pickle.load(f))
        except EOFError:
            break 

The complete code is in this link. The training bit I put here is lines 166 to 183 and the loading bit lines 361 to 368.

It's there a better way to do this? What am I doing wrong?

  • So, if you retrieve a `model` from `models` before pickling/dumping, prediction works fine? – rickhg12hs Mar 22 '22 at 02:37
  • Yes. Actually, after pickling/dumping too. Because I do training-->pickling-->dumping and that way works fine. It's when I skip the training and pickling steps, to just use the already trained and saved models, that it does not work. – Mariana Vivas Mar 22 '22 at 19:15
  • I haven't run your full code, but when I `pickle.dump`/`pickle.load` tuples like yours, everything works. My `model.predict` work. If you `pickle.load` and then `model.predict(X_train)` do you get the same results? I.e., predicting previous results are the same? – rickhg12hs Mar 23 '22 at 14:16
  • No, when I load `models` and try to predict on the exact same dataset I get that all predictions are zero. – Mariana Vivas Mar 23 '22 at 16:46
  • It won't be fun, but if you modify your code to store all the models in a string with `pickle.dumps(...` and then later `pickle.loads(...`, do the models then `predict` as expected? I'm wondering if there is a `pickle` problem or perhaps some disk save/load problem. – rickhg12hs Mar 23 '22 at 22:57
  • I solved it. I'm using a tensorflow model too. For some reason, if you load the tensorflow model last it ruins the predictions for the sklearn models. – Mariana Vivas Mar 24 '22 at 15:55
  • Wow, that's weird. Glad you fixed it! – rickhg12hs Mar 24 '22 at 21:06

0 Answers0