1

I have a class, shown below, that inherits from datetime.date

class my_class(datetime.date):
    def __new__(cls, arg1=None):
        print ("arg1: ", arg1)
        return super().__new__(cls, 2015, 11, 2)

When I copy (or deepcopy) an instance of this class, as shown below,

import copy
my_obj = my_class(5)
copy.copy(my_obj)

I get the following output

arg1:  5
arg1:  b'\x07\xdf\x0b\x02'

So clearly some bytes object is passed as the first argument when doing a copy (or deepcopy). I know that the copy function works by passing arguments to the constructor of the object in question such that an identical object will be created...so why is it passing these bytes objects?

This behaviour seems to occur only when inheriting from immutable objects, since when I tested this when inheriting from a list object, the copy function did not pass a bytes object.

(Note that the class I defined above is a very (very!) simplified case of a real class I am using, that also inherits from datetime.date)

Thanks in advance

NOTE

I am using Python 3.5.1

Akshat Mahajan
  • 9,543
  • 4
  • 35
  • 44
Konrad
  • 2,207
  • 4
  • 20
  • 31
  • Please check your class because when we do `my_obj = my_class(5)` we get `return super().__new__(cls, 2015, 11, 2) TypeError: super() takes at least 1 argument (0 given)` for the current class definition – trinchet Apr 25 '16 at 21:00
  • @trinchet I had no such issues when running that command (and I copied and pasted that code from my editor into the question bar on SO). I am using Python 3.5.1, with WinPython 64-bit. – Konrad Apr 25 '16 at 21:11
  • @trinchet [In Python 3.x, super() can be called with no arguments](http://stackoverflow.com/questions/19608134/why-is-python-3-xs-super-magic). OP is using 3.5, so would recommend using that. – Akshat Mahajan Apr 25 '16 at 21:21

1 Answers1

2

The reason this happens is because internally copy and deepcopy make use of pickle to facilitate object state serialisation. From the docs:

This module does not copy types like module, method, stack trace, stack frame, file, socket, window, array, or any similar types. It does “copy” functions and classes (shallow and deeply), by returning the original object unchanged; this is compatible with the way these are treated by the pickle module.

...

In fact, the copy module uses the registered pickle functions from the copyreg module.

Why does copy need to pickle? Because:

A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.

To do that, though, copy needs to be able to understand the object hierarchy. Pickling is designed to solve that problem by mapping a hierarchy to a standard serialised format.

This also is where the byte objects come from.

“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.

It would appear that, for whatever reason (maybe datetime.date implements __new__ in a special way?), the unpickling isn't reproducing the integer argument passed to __new__.

Why? I have no earthly idea. Poke around a little bit and see what you can dig up.

The Solution

The docs also make it pretty clear what to do if copy() or deepcopy() are misbehaving for you - define your own copy implementation!

In order for a class to define its own copy implementation, it can define special methods __copy__() and __deepcopy__().

The former is called to implement the shallow copy operation; no additional arguments are passed.

The latter is called to implement the deep copy operation; it is passed one argument, the memo dictionary.

If the __deepcopy__() implementation needs to make a deep copy of a component, it should call the deepcopy() function with the component as first argument and the memo dictionary as second argument.

Community
  • 1
  • 1
Akshat Mahajan
  • 9,543
  • 4
  • 35
  • 44
  • Thanks for the insight. I am assuming that a deepcopy also needs to understand the object hierarchy, since the `__new__` function in that case also ends up with a serialized object? (and of course, it still needs to produce a new object with the current hierarchy preserved, although now with references to new objects) – Konrad Apr 25 '16 at 22:15
  • Yes, deepcopy also uses pickle to map the object hierarchy. – Akshat Mahajan Apr 25 '16 at 22:44