How to distribute sklearn models so that they work on different architectures?

Question

I have two models:

sklearn.linear_model.Lasso
sklearn.ensemble.GradientBoostingRegressor

Which I'm using to solve the same problem. Once trained, I'm persisting the models using joblib. The idea is to publish these persisted models so that others can use them. However, I just tried loading the dumped Gradient Boosting model on a 32 bit Python installation (having trained it originally on a 64 bit Python installation) and received this error:

  File "sklearn\tree\_tree.pyx", line 607, in sklearn.tree._tree.Tree.__cinit__
ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long long'

The docs are very sparse on detail about this issue: https://github.com/scikit-learn/scikit-learn/pull/7899/files

Weirdly, I don't get this error when loading the Lasso model. So two questions:

1) How can I persist my model so it can be used on different architectures?

2) Are certain types of sklearn algorithms able to be persisted for use on different architectures but not others? If yes, how can I tell? This answer: Scikits-Learn RandomForrest trained on 64bit python wont open on 32bit python indicates this is also the case for Random Forest, but that doesn't make it clear what other models are affected.

I realize that this issue can also be mitigated by always dockerizing any application training/loading the model, but I'm publishing for educational purposes where dockerizing everything is tricky.

Many thanks!

From what I can tell, it will mostly be up to chance if a model breaks on 32 bit or not, depending on what specifics are being used. Consider using Docker containers to deliver your models. Within a Docker environment, the architecture should not matter. — Lukas Thaler, Jan 27 '20 at 09:00

How to distribute sklearn models so that they work on different architectures?

0 Answers0