decompressing string from database in python

Question

I know this question will looks a bit unclear but I reached a level of frustration that drives me to ask this here..

I'm working with data from a POSTGRESQL database, and I get something like this:

2022-06-01 02:21:52.770293  2022-06-01 02:21:52.78704   \\x0a78daa5534d6fe32014fc2fdca90063b0c9a91f52d...
2022-06-01 02:21:55.991809  2022-06-01 02:21:56.04597   \\x0a78dac5534d6be33010fd2fbe2b58b264c9caa9ed4...

I know that the counter column in a compressed string that contains JSON-like data. I know (because it was already decompressed in the past) that the usage of zlib package can decompress this string (by something like zlib.decompress(mycompressedstring)

But there is a missing step here because this string \\x0a78... is not decompressable. I suspect there is an encoding-decoding work to do before calling zlib but I struggle to find what to do..

I tried:

test = bytes(sample.iloc[1]['counter'], 'UTF16')

This leads to something like:

I was thinking it is better but zlib cannot decompress this

testunc = zlib.decompress(test)

error: Error -3 while decompressing data: incorrect header check

Please, can someone help me there? Bu giving me a track to follow to fing what is wrong with this...

Please [*do not* use images of code/data/errors](https://meta.stackoverflow.com/a/285557/3439404) in your [mcve]. Copy the actual text, paste it into the question, then format it as code. — JosefZ, Nov 03 '22 at 15:57

score 2 · Accepted Answer · answered Nov 03 '22 at 16:14

2

The hexadecimal representations starting with 78da... are the starts of valid zlib streams. You need to discard the \\x0a and convert the remainder from hexadecimal to binary. The result of that would be given to zlib.decompress(). Look at a2b_hex in binascii.

answered Nov 03 '22 at 16:14

Mark Adler

101,978
13
118
158

many thanks! it saves my whole week – bAN Nov 04 '22 at 08:47
By curisosity, @mark how do you know that 78da is a valid start of zlib stream? – bAN Nov 04 '22 at 11:56
1

a) I instantly recognized it, having looked at many zlib streams, and b) I tested it and the subsequent bytes to see if, as far as they went, they were a valid zlib header followed by valid deflate compressed data. They are. – Mark Adler Nov 04 '22 at 15:27

decompressing string from database in python

1 Answers1