4

Numpy arrays, being extension types (aka defined using in extensions the C API), declare additional fields outside the scope of the Python interpreter (for example the data attribute, which is a Buffer Structure, as documented in Numpy's array interface.
To be able to serialize it, Python 2 used to use the __reduce__ function as part of the pickle protocol, as stated in the doc, and explained here.

But, even if __reduce__ still exists in Python 3, the Pickle protocol section (and Pickling and unpickling extension types a fortiori) was removed from the doc, so it is unclear what does what.
Moreover, there are additional entries that relate to pickling extension types:

  • copyreg, described as a Pickle interface constructor registration for extension types, but there's no mention of extension types in the copyreg module itself.
  • PEP 3118 -- Revising the buffer protocol which released a new buffer protocol for Python 3. (and maybe automates pickling for this buffer protocol).
  • New-style class: One can assume that the new-style classes have an influence on the pickling process.

So, how does all of this relate to Numpy arrays:

  1. Does Numpy array implement special methods, such as __reduce__ to inform Python on how to pickle them (or copyreg)? Numpy objects still expose a __reduce__ method, but it may be for compatibility reasons.
  2. Does Numpy uses Python's C-API structures that are supported out of the box by Pickle (like the new buffer protocol), so nothing supplementary is necessary in order to pickle numpy arrays?
Phylliade
  • 1,667
  • 4
  • 19
  • 27
  • 4
    `__reduce__` still exists in Python 3. – Martijn Pieters Jul 29 '17 at 17:04
  • ..[and here is the piece of documentation where `__reduce__` is mentioned in Python 3](https://docs.python.org/3/library/pickle.html?highlight=pickle#object.__reduce__). – Phillip Jul 29 '17 at 17:08
  • The point is that the mentions to extension types in the `reduce` doc have been removed, even if `reduce` still exists. But that was said is still true, as stated by the accepted answer (and in the following comments). – Phylliade Jul 29 '17 at 18:08

1 Answers1

8

Python 3 pickle still supports __reduce__, it is covered under the Pickling Class Instances section.

Numpy's support has not changed in this regard; it implements __reduce__ on arrays to support pickling in either Python 2 or 3:

>>> import numpy
>>> numpy.array(0).__reduce__()
(<built-in function _reconstruct>, (<class 'numpy.ndarray'>, (0,), b'b'), (1, (), dtype('int64'), False, b'\x00\x00\x00\x00\x00\x00\x00\x00'))

A three-element tuple is returned, consisting of a function object to recreate the value, a tuple of arguments for that function, and a state tuple to pass no newinstance.__setstate__().

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • So what is said in [Pickling and Unpickling extension types](https://docs.python.org/2.7/library/pickle.html#pickling-and-unpickling-extension-types) is still true? Why did they move the `__reduce__` doc from `extension types` to the more general `class instances` section? – Phylliade Jul 29 '17 at 17:38
  • 1
    @Phylliade: yes, everything is still true. The method is not exclusive for extension types (and the line between custom Python classes and extension types has largely been blurred). – Martijn Pieters Jul 29 '17 at 17:42
  • 1
    @Phylliade: `__reduce__` is the lower-level copy protocol implementation; custom Python classes should implement the higher-level (`__getnewargs_ex__ ` / `__getstate__` / `__setstate__`) methods) if possible, a default `__reduce__` implementation then uses those. – Martijn Pieters Jul 29 '17 at 17:45