12

If we serialize randomforest model using joblib on a 64-bit machine, and then unpack on a 32-bit machine, there is an exception:

ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long long'

This question has been asked before: Scikits-Learn RandomForrest trained on 64bit python wont open on 32bit python . But the question has not been answered from since 2014.

Sample code to learn the model (On a 64-bit machine):

modelPath="../"
featureVec=...
labelVec = ...
forest = RandomForestClassifier()
randomSearch = RandomizedSearchCV(forest, param_distributions=param_dict, cv=10, scoring='accuracy',
                                      n_iter=100, refit=True)
randomSearch.fit(X=featureVec, y=labelVec)
model = randomSearch.best_estimator_
joblib.dump(model, modelPath)

Sample code to unpack on a 32-bit machine:

modelPath="../"
model = joblib.load(modelPkl) # ValueError thrown here

My question is: Is there any generic workaround for this problem if we have to learn on a 64-bit machine, and port it to 32-bit machine for prediction?

Edit: Tried to use pickle directly instead of joblib. There are still the same error. The error occurs in the core pickle library (for both joblib and pickle):

  File "/usr/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1133, in load_reduce
    value = func(*args)
  File "sklearn/tree/_tree.pyx", line 585, in sklearn.tree._tree.Tree.__cinit__ (sklearn/tree/_tree.c:7286)
ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long long'
Community
  • 1
  • 1
Vinay Kolar
  • 913
  • 1
  • 7
  • 13
  • What happens if you stick to pickle instead of joblib? The 32/64 diffs in random-forests should be fixed (merged issue) and python's pickle should also be able to do this. Question is of course: is it a good idea to run the predictions on a 32-bit machine. Depending on your learning-params, the memory-bound will be an issue. – sascha Aug 30 '16 at 23:47
  • Thanks @sascha. Any idea which version supports this 32/64 serialization? Tried with bare pickle on python 2.7. The error seems to be in the core pickle (which joblib also uses). I have edited the post. Also, sometimes we have to act/predict on low-footprint machines or sensors due to latency reasons -- irrespective on where we do a batch learn. – Vinay Kolar Aug 31 '16 at 22:35

1 Answers1

0

I had something similar where I wanted to train on a 64-bit Ubuntu machine and run on an old 32-bit Raspberry Pi.

In my case, I trained with scikit-learn, but predicted with treelite.

On my 64-bit machine, I trained a RandomForestClassifier and exported a model.zip:

from sklearn.ensemble import RandomForestClassifier
import treelite
import numpy as np

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype='float32')
y = np.array([0, 1, 1, 0], dtype='float32')

clf = RandomForestClassifier(n_estimators=10).fit(X, y)

# Requires `scikit-learn==1.1.0` currently:
model = treelite.sklearn.import_model_with_model_builder(clf)
model.export_srcpkg(platform="unix", toolchain="gcc",
                    pkgpath="./mymodel.zip", libname="mymodel.so")

Then after transferring model.zip to my 32-bit Raspberry Pi:

unzip model.zip
cd mymodel/
make
import numpy as np
import treelite

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype='float32')
dmat = treelite_runtime.DMatrix(X)

predictor = treelite_runtime.Predictor("mymodel.so", verbose=True)
print(predictor.predict(dmat))
#     [0.3 0.5 0.6 0.2]
#     This would be the same as: `clf.predict_proba(X)[:, 1]`

Notes specific to 32-bit Raspberry Pi

If anyone is trying to get this running on a 32-bit Raspberry Pi in the future, there are a couple extra steps since it's a non-standard platform. Here is what I did to get running with Python 3.10.6, numpy==1.23.5, scipy==1.9.3, and treelite-runtime==3.0.1 with an ARMv7 Processor rev 4 (v7l) / Raspberry Pi 3 Model B.

  1. Setup dependencies:
sudo apt install gcc g++ gfortran python3-dev python3-venv make cmake ninja-build pkg-config autoconf
  1. Increase size of swap space to compile scipy (you can get pre-built wheels for numpy and scipy, notes are here for completeness. Based on this answer):
sudo /bin/dd if=/dev/zero of=/var/swap.1 bs=1M count=1024
sudo /sbin/mkswap /var/swap.1
sudo chmod 600 /var/swap.1
sudo /sbin/swapon /var/swap.1
  1. Python environment
python3 -m venv venv
source venv/bin/activate
pip install scipy==1.9.3 numpy==1.23.5 treelite-runtime==3.0.1
  1. Compile the model (repeated from earlier)
unzip model.zip
cd mymodel/
make
Alexander L. Hayes
  • 3,892
  • 4
  • 13
  • 34