0

According to this post What is a fast pythonic way to deepcopy just data from a python dict or list ? msgpack is 10 times faster than copy.deepcopy but I cannot figure out how to use it.

I tried

item2 = msgpack.unpack(msgpack.packb(item1))

In place of:

item2 = copy.deepcopy(item1)

But I get

File "msgpack/_unpacker.pyx", line 228, in msgpack._unpacker.unpack
AttributeError: 'bytes' object has no attribute 'read'

The documentation located here http://msgpack-python.readthedocs.io/en/latest/api.html is incomprehensible but that's no surprise because just about all computer documentation is incomprehensible to me.

fred russell
  • 307
  • 2
  • 11
  • If you can't follow the documentation for msgpack, I am 100% certain that trying to use it to hack a marginally faster version of deepcopy is absolutely the wrong thing. Write the simplest most straightforward code you can, and worry about perfomance later. – Paul Becotte Jan 11 '18 at 13:50
  • No thank you. My program just slowed down by a factor of 3 due to deepcopy. If I can speed it up to the way it was before then I'm doing that. – fred russell Jan 11 '18 at 13:55
  • Up to you :) There are implications to choosing serialization/deserialization over deepcopy, and its hard to say if your use case will be affected by those. THAT is why it is faster- it is not doing a lot of the stuff deepcopy is doing. Further, in the event that you really don't need the stuff deepcopy is doing, the most performant thing is almost certainly to write a simple method to copy exactly what you do need, since that will wind up doing less than msgpack or json.loads(json.dumps(x)) would do. – Paul Becotte Jan 11 '18 at 14:03

2 Answers2

1

If you look at the docs, msgpack.unpack expects a stream, not a block of bytes:

>>> help(msgpack.unpack)
Help on built-in function unpack in module msgpack._unpacker:

    unpack(...)
        unpack(stream, object_hook=None, list_hook=None, bool use_list=1, encoding=None, unicode_errors='strict', object_pairs_hook=None, ext_hook=ExtType, Py_ssize_t max_str_len=2147483647, Py_ssize_t max_bin_len=2147483647, Py_ssize_t max_array_len=2147483647, Py_ssize_t max_map_len=2147483647, Py_ssize_t max_ext_len=2147483647)

Just as you used packb to pack the object, you should use unpackb to unpack it:

>>> item2 = msgpack.unpackb(msgpack.packb(item1))
larsks
  • 277,717
  • 41
  • 399
  • 399
  • Ok, I used `item2 = msgpack.unpackb(msgpack.packb(item1))` and no error was thrown but it seriously garbled my data and messed up my program. So I think I'm probably going to give up on msgpack. – fred russell Jan 11 '18 at 14:06
  • I tried using json `json.loads(json.dumps(object))` and although no error was thrown it broke my code. Is it supposed to be the same as copy.deepcopy? – fred russell Jan 11 '18 at 14:22
  • Actually, I've noticed that the problem stems from another list embedded within the list. Maybe `json` does not deepcopy the embedded lists – fred russell Jan 11 '18 at 14:41
  • Neither `msgpack` nor `json` should garble your data. If you're getting odd results, it would help if you could post a new question that includes a complete reproducer. That will make it easier for us to help you out. – larsks Jan 11 '18 at 18:06
  • Sorry for using sloppy language. What I meant was that msgpack turns unicode into raw strings which I cannot read. And json and msgpack do not preserve the identity of references as seen in this example `list1 = [[[1,2]], [2,3]] list2 = list1 list3 = [list1, list2] list4 = copy.deepcopy(list3) list4[0][0][0][0] = 4 list5 = json.loads(json.dumps(list3)) list5[0][0][0][0] = 4 >>> list5[0][1][0][0] = 1 >>> list4[0][1][0][0] = 4` sorry about the formatting but I do not know how to put code into comments. – fred russell Jan 12 '18 at 04:51
0

you are using packb() which returns packed bytes, so to unpack , should use unpackb()

unpacmsgpack.unpackb(msgpack.packb(item1))
Suresh
  • 5,678
  • 2
  • 24
  • 40