I'm writing a tool to interact with a popular data warehouse SaaS. Their online sql editor serializes sql worksheets to JSON, but the body of the SQL worksheet is zlib deflated using pako.js. I'm trying to read and inflate these zlib strings from python, but I can only decode bytestrings that contain short
An example with the sql text was the letter a
:
bytestring = b'x\xef\xbf\xbdK\x04\x00\x00b\x00b\n'
zlib.decompress(bytestring[4:-4], -15).decode('utf-8')
>>> "a"
If I include a semicolon a;
, this fails to decompress:
bytestring = b'x\xef\xbf\xbdK\xef\xbf\xbd\x06\x00\x00\xef\xbf\xbd\x00\xef\xbf\xbd\n'
zlib.decompress(bytestring[4:-4], -15).decode('utf-8')
*** UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8f in position 1: invalid start byte
Note: I've also tried these examples decoding with 'punycode', which I have found references to in the javascript implementation.
My understanding of zlib is pretty limited, but I've picked up that the first two and last four bytes of a zlib string are headers/footers and can be trimmed if we run zlib with the magic number -15. It's entirely possible there is zlib magic number that would decompress these strings without needing to strip the header and footers, but I wasn't able to get any combination to work when looping from -64 to 64.
I've breakpointed my way through the online sql worksheet editor's save and load functions and found they are using the pako zlib library pako.deflate(a, {to: 'string'})
and pako.inflate(b['body'], {to: 'string'})
And I'm able to inflate/deflate sql strings in the browser using the pako
library, but haven't been able to reproduce the same results in python.