I'm the klepto
author. If you are indeed just trying to pickle a numpy
array, the best approach is to just use the built-in dump
method on the array
(unless the array is too large to fit within memory constraints).
Almost any code that does serialization uses one of the serialization packages (dill
, cloudpickle
or pickle
), unless there's a serialization method built-in to the object itself, like in numpy
. joblib
uses cloudpickle
, and both cloudpickle
and dill
utilize the internal serialization that a numpy
array itself provides (pickle
does not use it, and thus the serialization bloats and can cause memory failures).
>>> import numpy as np
>>> a = np.random.random((1500,1500,1500,1))
>>> a.dump('foo.pkl')
If the above still gives you a memory error, then joblib
, klepto
, dill
, or otherwise really can't help you unless you break up the array into smaller chunks -- or potentially use a dask
array (which is designed for large array data). I think your array is large enough that it should cause a memory error (I tested it on my own system) even with the above optimally efficient method, so you'll either need to break the array into chunks, or store it as a dask
array.
To be clear, klepto
is intended for large non-array data (like tables or dicts), while dask
is intended for large array data.
Another option is to use a numpy.memmap
array, which directly writes the array to a file -- bypassing memory. These are a bit complex to use, and is what dask
attempts to do for you with a simple interface.