0

I am dumping a string of 0s and 1s of length 4807100171 into a pickle file because I had previous trouble with bitarray and wanted to see if pickle could be a solution to my problem. However, after I load it, it now is of length 512132875.

Why is that?

I have searched to see if there is any limitations from pickle, but I haven't found anything... If there is a well known reason, I might not be using the correct key words...

Edit:

You can fill a string b of random values so you get a length of 4807100171 with the technique you prefer - perhaps something like a simple for loop going to 4807100171. I personally encrypt original data using Huffman coding but it would be a long example that I feel is not really necessary here. I then dump the string b as follow:

b = ""
for i in range(4807100171)
    b += 0

import cPickle as pickle
pickle.dump(b, open("string.p", "wb"), pickle.HIGHEST_PROTOCOL)
macrocosme
  • 473
  • 7
  • 24
  • Why can't you use `bytearray` and write to a file in binary mode (`wb`)? – John Lyon Oct 16 '12 at 01:33
  • Do you have any idea why the string is truncated using pickle.dump? – macrocosme Oct 16 '12 at 02:15
  • bytearray doesn't seem to be a good way to go... The created file is 4,81 Go... Instead of using pickle, this time I did this: with open('../string.p', 'wb') as f: f.write(bytearray(b)) – macrocosme Oct 16 '12 at 02:49

1 Answers1

0

This is obviously an integer overflow problem - notice that 4807100171 minus 2**32 is 512132875. Unfortunately, a 32-bit integer is how the binary pickle format represents string lengths. It appears that using the text pickle format (protocol version 0) would avoid this problem, but text pickles are generally longer, and would take an absurd amount of memory to handle a string of this size. I haven't actually tested this - I don't think I have enough memory on any of my computers to do so!

If this one string is the only thing being stored, then it would be far simpler to just write the string itself to a file.

jasonharper
  • 9,450
  • 2
  • 18
  • 42
  • I know it is a long string - which is why I tried bitarray first. I'm actually trying to compress an already big file. There seem to be a length problem using bitarray too. However, it might be a similar problem from what you pointed out! You can check my previous post here (which I haven't had time to update with a good code example yet): http://stackoverflow.com/questions/12449741/bitarray-to01-doesnt-return-only-0s-and-1s-in-string-python – macrocosme Oct 16 '12 at 03:05