I have a list containing entries of binary data, each entry of an arbitrary length. I would like to store this all in one large binary string. Considering that this data may conceivably contain any sequence of chars I might choose to separate each entry in my list, how can I pack this list into a string that still has distinct entries?
-
_store this all in one large binary string_ ... in memory or on disk? _still has distinct entries_ ... meaning still addressable as separate python objects or something serialized to storage that can be put back into python objects later? – tdelaney Jan 01 '16 at 01:44
-
@tdelaney what I am actually trying to do is pack a list of very large integers. Struct.pack doesn't support packing integers that are of a certain size. I can convert the ints into binary data, but this has no fixed size, hence the issue. So no, the objects don't need to be accessible while packed. – trevorKirkby Jan 01 '16 at 01:46
-
have you considered `json.dump(list_of_big_integers)`? If performance is the issue then you should provide more details about your task. – jfs Jan 01 '16 at 03:57
-
@J.F.Sebastian I am actually trying to convert a two dimensional array of big numbers to a string for a cipher algorithm. There are many reasons why a string is preferable to a list or numpy array for ciphertext. This is why performance is a concern, because if the algorithm handles a lot of data, I don't want to greatly increase the size without good reason. – trevorKirkby Jan 02 '16 at 19:42
-
@someone-or-other: if your application is related to cryptography then the more reasons to use existing algorithms, data formats instead of inventing your own. – jfs Jan 03 '16 at 00:44
-
True enough. This is why I prefer an "eval"-able string representation over some new delimiter protocol I would write myself. This representation of a sequence is well established by the python interpreter. – trevorKirkby Jan 03 '16 at 01:03
4 Answers
The pickle
protocol should do it. dump
writes to a file and dumps
writes to a string.
mylist = [list of large integers]
pickle.dump(mylist, open('somefile', 'wb'), protocol=2)

- 73,364
- 6
- 83
- 116
You cannot use a single character to separate them for the reason you say. You could designate a special separator character, so 0x0
. Then you would also need a way to escape any 0x0
s that appear in the data. However, you would need to also escape the escape character anywhere it shows up.

- 1,487
- 1
- 10
- 23
You are probably better off writing a byte count for the length of the objects to follow than using a delimiter.
If pure space efficiency isn't critical, another way that might work is to use pickle
instead. One more option is to base64 encode the bytes and then use a delimiter outside the base64 character set.

- 8,080
- 3
- 32
- 36
-
1the first option sounds like [netstring format](http://stackoverflow.com/a/31862186/4279). – jfs Jan 01 '16 at 03:50
-
Writing a byte count length would still require a delimiter if you are storing your byte count in the same string as the data... I agree that byte counts have certain advantages though. – trevorKirkby Jan 02 '16 at 19:50
-
My point was you are not using a delimiter in a location where data is expected. It is possible to not need a delimiter, if it were a fixed-width byte count, especially if it were stored perhaps as a binary value as a 32-bit or 16-bit integer. – David Maust Jan 02 '16 at 19:53
Not sure why you need the output to be in binary, so this might not work for you. However you can write your data to a single string using zlib:
>>> import zlib
>>> l=[bin(i) for i in range(10)]
>>> zlib.compress(str(l))
'x\x9c\x8bV7H2P\xd7Q\x00R\x86P\n\xc6\x85\xf3a\x02\x060\x11\x84\x12\x84\x1a\xb8"\xa0\xaaX\x00\xe9\x95\x11\x14'
Then you can decompress easily:
>>> zlib.decompress(zlib.compress(str(l)))
"['0b0', '0b1', '0b10', '0b11', '0b100', '0b101', '0b110', '0b111', '0b1000', '0b1001']"
To turn it back into a list from there, you can use eval:
>>> new_l=eval(zlib.decompress(zlib.compress(str(l))))
>>> new_l
['0b0', '0b1', '0b10', '0b11', '0b100', '0b101', '0b110', '0b111', '0b1000', '0b1001']

- 4,993
- 3
- 27
- 37
-
-
I'm sure what you mean; because it's a string now? You can turn it back into a list with eval. – rofls Jan 01 '16 at 01:49
-
Oh, it didn't click that you would `eval` later. I think you could skip `bin` and do a `repr` on the original list. – tdelaney Jan 01 '16 at 02:08
-
1You know, this is probably about as efficient as any kind of delimiter and byte length system I could come up with. And it is a preexisting structure that is fairly clear. – trevorKirkby Jan 02 '16 at 19:53
-
Thanks @someone-or-other :) You can also just do: `'lalapalooza'.encode('zlib')` and `''x\x9c\xcbI\xccI,H\xcc\xc9\xcf\xafJ\x04\x00\x1b\x1d\x04\x91''.decode('zlib')` to encode and decode strings, which doesn't even require any imports. That is, encoding and decoding are built-in string methods in Python. – rofls Jan 02 '16 at 20:32
-
1this is terrible code (`bin`,`str`/`eval` -- really?). Before inventing square wheels that use `eval()`, consider existing formats such as `json`, `pickle`, `netstring`, `bson`, ascii armor, etc. – jfs Jan 03 '16 at 00:40
-
Thanks for your feedback @J.F.Sebastian. I am aware of `pickle`, `json` and `bson`. I'm not really sure how `json` would help here though. Do you care to enlighten me? Do you mean just do `json.dumps(some_list_of_ints)` and save that to a file? I think that would be very simiar to writing `str(some_list_of_ints)` to a file (same plus quotes?). So IMO that's sort of like saying "just write your list to a file," which isn't really an answer. – rofls Jan 03 '16 at 03:09
-
Also, AFAIK `eval` can be used safely: I'm giving OP benefit of the doubt here assuming he's not taking user input, especially arbitrary, unsanitized stuff. If you care to link me to something that says "never user python's eval" I will further consider not using it. – rofls Jan 03 '16 at 03:10