20

I train a RandomForestRegressor model on 64bit python. I pickle the object. When trying to unpickle the object on 32bit python I get the following error:

'ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long long''

I really have no idea how to fix this, so any help would be hugely appreciated.

Edit: more detail

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\python27\lib\pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "c:\python27\lib\pickle.py", line 858, in load
    dispatch[key](self)
  File "c:\python27\lib\pickle.py", line 1133, in load_reduce
    value = func(*args)
  File "_tree.pyx", line 1282, in sklearn.tree._tree.Tree.__cinit__ (sklearn\tre
e\_tree.c:10389)
Will Beauchamp
  • 579
  • 2
  • 7
  • 18
  • I am not sure whether this should be considered a bug of the cython Tree class that is not tolerant enough at unpickling time, a bad choice for a buffer dtype or a fundamental limitation of pickling sklearn models. – ogrisel Jan 10 '14 at 14:45
  • Has there been any progress on this? I'm finding the same problem at the moment. – martinako Nov 19 '15 at 12:52
  • By Nov/2017 I have exactly the same issue. sklearn.__version__ : '0.19.1' – dpetrini Nov 25 '17 at 13:15

4 Answers4

11

This occurs because the random forest code uses different types for indices on 32-bit and 64-bit machines. This can, unfortunately, only be fixed by overhauling the random forests code. Since several scikit-learn devs are working on that anyway, I put it on the todo list.

For now, the training and testing machines need to have the same pointer size.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • @larsmans Would you mind, larsmans, a question about the current state of the word-size independent **`.dump()`** / **`.load()`** methods in `scikit-learn`? Anyway, have you heard about / met in the meantime any ***workaround*** ( be it via `marshall` or other means ) ? Would be very glad if you find it usefull to share such additional knowledge, larsmans. Thanks. – user3666197 May 31 '15 at 14:52
  • Btw, the pull_2732 (referred above), mentions a debate on `memalloc` issues on "big trees". Yes, these are important. Very important to keep low-profile `.load()`-s once moving `pickle.dump()`-ed beasts from their more than 3.2GB large disk-representations back into RAM. Hope to hear fresh news from your team ( 've seen not much activity on S/O since 2014-01). **Stay tuned. You do a great job!** – user3666197 May 31 '15 at 16:17
  • 1
    does this particular issue have a solution now? facing the same trouble currently – Varun Rajan Jun 28 '17 at 12:40
  • @FredFoo, is there any solution for this? – Ravi Nov 12 '19 at 05:39
  • Yearly check for ... is there a solution for this? – user4446237 Nov 17 '20 at 17:16
1

For ease, please use python 64 bit version to decentralize your model. I faced the same issue recently. after taking that step it was resolved.

So try running it on a 64 bit version. I hope this helps

0

I fixed this problem with training the model in the same machine. I was training the model on Jupyter Notebook(Windows PC) and trying to load into Raspberry Pi but I got the error. Therefore, I trained the model in Raspberry Pi and maintained again then I fixed the problem.

kamil3di
  • 1
  • 1
0

I had the same problem when I trained the model with python 3.7.0 32bit installed on my system. It got solved after installing the python 3.8.10 64bit version and training the model again.

Floyd Fernandes
  • 147
  • 1
  • 4