0

I have a series of strings in a list named 'lines' and I compress them as follows:

import bz2
compressor = bz2.BZ2Compressor(compressionLevel)
for l in lines:
    compressor.compress(l)
compressedData = compressor.flush()
decompressedData = bz2.decompress(compressedData)

When compressionLevel is set to 8 or 9, this works fine. When it's any number between 1 and 7 (inclusive), the final line fails with an IOError: invalid data stream. The same occurs if I use the sequential decompressor. However, if I join the strings into one long string and use the one-shot compressor function, it works fine:

import bz2
compressedData = bz2.compress("\n".join(lines))
decompressedData = bz2.decompress(compressedData)
# Works perfectly

Do you know why this would be and how to make it work at lower compression levels?

thornate
  • 4,902
  • 9
  • 39
  • 43

1 Answers1

1

You are throwing away the compressed data returned by compressor.compress(l) ... docs say "Returns a chunk of compressed data if possible, or an empty byte string otherwise." You need to do something like this:

# setup code goes here
for l in lines:
    chunk = compressor.compress(l)
    if chunk: do_something_with(chunk)
chunk = compressor.flush()
if chunk: do_something_with(chunk)
# teardown code goes here

Also note that your oneshot code uses "\n".join() ... to check this against the chunked result, use "".join()

Also beware of bytes/str issues e.g. the above should be b"whatever".join().

What version of Python are you using?

John Machin
  • 81,303
  • 11
  • 141
  • 189
  • Ah, I see. I had missed the fact that the compress function returns partial results rather than all at once at flush(). Interesting that the compressionLevel of 8 or 9 never gets to the point that the partial result is ready - this difference might not even have shown up if I was testing on another set of documents! – thornate Jan 14 '17 at 00:56