0

I have a block of msgpack'd data is created as shown below:

#!/usr/bin/env python

from io import BytesIO
import msgpack

packer = msgpack.Packer()
buf = BytesIO()

buf.write(packer.pack("foo"))
buf.write(packer.pack("bar"))
buf.write(packer.pack("baz"))

Later in my app (or a different app) I need to unpack the first two elements but want access to the third element STILL packed. The only way I have found to do that so far is to repack this third element as shown below, which is rather inefficient.

buf.seek(0)
unpacker = msgpack.Unpacker(buf)
item1 = unpacker.unpack()
item2 = unpacker.unpack()
item3 = unpacker.unpack()

packed_item3 = msgpack.pack(item3)

This gets me where I want, but I would prefer to access this this last item directly so I can pass it on to where it needs to go already packed.

mr19
  • 187
  • 1
  • 15

2 Answers2

0

Since your packs will not be of constant size after doing a msgpack, you can use a identifiable set of bytes as a seperator of your packs. When you need direct access to Nth pack, still in packed state, you iterate over your byte array, and your Nth pack will lie after N-1 th seperator. Though this will have a O(n) complexity and need iteration over the whole bytearray till your required pack. String example with "####" as seperator would look like :

"pack1####pack2####pack3####pack4####pack5...."
DhruvPathak
  • 42,059
  • 16
  • 116
  • 175
  • Would require change in format which is not possible at this point. Was able to do this in C++ but looks like in Python I'll just have to repack as I prefer to keep the code simple and understandable. Just not worth the added complexity for the minimal efficiency gains. – mr19 Aug 10 '16 at 14:53
0

See http://pythonhosted.org/msgpack-python/api.html#msgpack.Unpacker.skip

packed_item3 = None
def callback(b):
    global packed_item3
    packed_item3 = b
unpacker.skip(write_bytes=callback)

But write_bytes option will be deprecated. And other msgpack implementations doesn't have such a API.

More common way is double-packing.

buf.write(msgpack.packb(msgpack.packb(item3))

In this way, you can get packed_item3 without unpacking. And this way can be used in other msgpack implementations.

For example, fluentd uses such way to achieve high throughput.

methane
  • 469
  • 3
  • 5
  • 11
  • Unfortunately can't change format as it's already in use and this was an add-on app I wrote to archive a single piece of the msgpack'd data. – mr19 Aug 10 '16 at 14:50
  • msgpack is not designed only for your special case. If you can't change format, what you can do is (a) keep repacking, or (b) use `.skip(write_bytes=callback)` and don't upgrade library version. – methane Aug 13 '16 at 22:58
  • works on the C side, see response to my question, i'll just stick with re-packing, only one special case where I need this functionality anyway. http://stackoverflow.com/questions/33272500/get-pointer-to-and-length-of-element-in-msgpack-array-from-c-c – mr19 Aug 14 '16 at 16:06