1

I use python3 to pickle a dictionary that contains text keys to text or scipy.sparse.lil_matrix (linked list) with an integer dtype.

Is that file cross platform (over machines that installed python3, scipy, and numpy) ?

Is pickled dictionary containing just text and numpy array cross platform?

pickle is cross-platform: Is pickle file of python cross-platform?

numpy's .npy file format is cross platform: Is numpy.save cross platform? ...

Not sure about what happens if I manually pickle a numpy array. I checked on two different machines with intel cpu for an integer numpy array and the values remain the same.

manual pickling a numpy array:

import numpy as np
import pickle
x = np.random.randint(0, 2**63 - 1, dtype=np.int32)
d = {
    'x': x,
    'blah': "blah blah blah"
}
with open('bomb.pickle', 'wb+') as f:
    pickle.dump(d, f)
hamster on wheels
  • 2,771
  • 17
  • 50
  • It should be cross platform but to unpickle the content you need to have scipy installed as well. – Radek Hofman Jul 29 '17 at 18:31
  • i mean it like that on other computers with a different OS, if scipy is installed, the unpicking should work. – Radek Hofman Jul 29 '17 at 18:32
  • it depends on how pickle works on a numpy array object... if it is copying the binary bytes from memory, then change of endianness will mess up that even for integers. Forget about floating points formats compatibility for different cpus if it is a memory dump. I don't know how the C part of numpy array get saved by pickle. The Python part of the dictionary and numpy array will be cross platform when it is pickled. If manually pickled numpy array is cross platform, then the manually pickled scipy sparse matrices, which is built with numpy array, will be cross platform as well. – hamster on wheels Jul 29 '17 at 18:38
  • 1
    Have you tried hdf5 instead? E.g. using pytables or h5py. This should take care of all Python and platform differences. – denfromufa Jul 30 '17 at 03:42
  • those are old part of the code. new part of the code already uses hdf5. – hamster on wheels Jul 30 '17 at 16:05

1 Answers1

1

Manual pickling of numpy array with integer dtype if we don't have to fix import.

This is because if the an object has a reduce method, pickle will use it.

numpy.ndarray.__reduce__'s doc https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.reduce.html

"How numpy ndarray get pickled" does not reference the source code: How does Python 3 know how to pickle extension types, especially Numpy arrays?

In anceint time, Pickling with protocol 0 is not portable for floats, NaN, Inf: https://mail.python.org/pipermail/tutor/2010-May/075980.html

Official doc on pickling floats on python 3.1 with text protocol: https://docs.python.org/3/whatsnew/3.1.html

The new algorithm depends on certain features in the underlying floating point implementation. If the required features are not found, the old algorithm will continue to be used. Also, the text pickle protocols assure cross-platform portability by using the old algorithm.

numpy's source code for writing to a file is in format.write_array and npyio.save

(https://github.com/numpy/numpy/blob/v1.13.0/numpy/lib/format.py, https://github.com/numpy/numpy/blob/v1.13.0/numpy/lib/npyio.py#L435-L512)

The header can be saved as text. The data is simply a pickle.dump if pickling argument is true, which is the default value for numpy.save. In format.write_array, I found:

pickle.dump(array, fp, protocol=2, **pickle_kwargs)

format.py also says:

The .npy format is the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk. The format stores all of the shape and dtype information necessary to reconstruct the array correctly even on another machine with a different architecture.

So manual pickling of numpy array is cross platform (if we don't have to fix imports) because np.save also use pickle and is cross platform.

np.save uses protocol 2. The pickle.DEFAULT_PROTOCOL is 3 for the two python 3 on the two machines.

hamster on wheels
  • 2,771
  • 17
  • 50
  • hopefully that means manual pickling of scipy sparse array also works. – hamster on wheels Jul 29 '17 at 18:53
  • `np.save` writes a block with attributes like shape and dtype, followed by an image of the databuffer (which can read as a memmap). Dtype objects are saved via pickle. Conversely a pickle of an array is its `save`. – hpaulj Jul 30 '17 at 06:22
  • 1
    I'm not sure about pickle for sparse classes; they are not a subclass of `np.ndarray`. Recent scipy versions have a `sparse.save_npz` function, which creates a `npz` archive with the required arrays and attributes. I recommend looking at its code. `scipy.io.savemat` can also write a sparse matrix in a matlab compatible format. Following the `save_npz` model I could write a sparse matrix with `h5py`. – hpaulj Jul 30 '17 at 06:28
  • `save_npz` does not save `lil` or `dok` formats, because the data isn't in numeric array format. `h5py` would also have problems with these. `coo` and `csr` are better for array oriented saving. – hpaulj Jul 30 '17 at 06:57
  • I suspect, but should verify, that all `sparse` classes depend on the inherited `.__reduce__`, thus pickling in the same way as most user defined classes. – hpaulj Jul 30 '17 at 11:30