0

I am writing a Python function that receives a string and decompresses the string using zlib.

I am trying to translate from the following Go code to Python, that I know works (please excuse the one letter variable names, this code was written by someone else):

var b bytes.Buffer
r := bytes.NewReader(s) // s is a []byte
z, err := zlib.NewReader(r)
if err != nil {
    // Error handling
}
_, err = io.Copy(&b, z)
if err != nil {
    // Error handling
}
err = z.Close()
if err != nil {
    // Error handling
}

The data is always received in Python as a string type rather than a bytes or byte array type - this is outside my control. (For more context, see below.)

How can I properly encode or convert the string to a bytes object that will be accepted by zlib.decompress?

Do I need to set the wbits parameter to something in particular?

Here is what I tried so far:

uncompressed = zlib.decompress(s.encode())

I am getting this error:

zlib.error: Error -3 while decompressing data: incorrect header check

I also tried

uncompressed = zlib.decompress(bytearray(s, 'utf-8'))

and

uncompressed = zlib.decompress(bytes(s, 'utf-8'))

but both failed with the same error.

Additional context

For those who are interested, here is some further context.

The system I am working on serializes a Go struct and sends the data as a raw array of bytes over the network. To save bandwidth, a portion of the data is compressed before serializing to bytes.

The reason the Go code always gets the data as a []byte is because it can unmarshal the JSON raw bytes with json.Unmarshal, like this:

env := RedactedStructName{}
err := json.Unmarshal(buf, &env) // buf is a []byte

I did not include this code above because I wanted to keep my question as simple as possible. In Python, the RedactedStructName struct does not exist.

On the other end, the Python program that I am working on needs to deserialize the data and decompress the compressed data so that it can work on it.

The data, when passed through json.loads, produces a Python dictionary. The compressed payload is a value in the dictionary. I don't know why, but json.loads always causes the compressed data to be a Python string rather than a Python bytes or bytearray object.

Shane Bishop
  • 3,905
  • 4
  • 17
  • 47
  • *"The data is always received as a string type"* - That just seems wrong. I suggest you out ask whoever *does* have control over that to fix it. Or at least to tell you how they created the string from bytes. – Kelly Bundy Sep 13 '22 at 21:06
  • @KellyBundy, I updated the comment to be correct on the type of `s` in the Go code I am translating to Python. I also added a new section at the bottom of my question with additional context to explain why the data I receive is always a string type. – Shane Bishop Sep 13 '22 at 21:30
  • Ah, JSON. Whose strings are sequences of Unicode characters, which in Python is `str`. But the original compression surely produced *bytes*, which then *somehow* got converted to the Unicode strings in JSON. The question is how. We could guess, but I wouldn't. Ought to be documented somewhere or at least be visible in the code that does it. Maybe to Go people it's clear from the `json.Unmarshal` snippet, but I'm not one. – Kelly Bundy Sep 13 '22 at 21:51

1 Answers1

0

I found a solution.

As pointed out by Kelly Bundy in their comment, JSON strings are sequences of Unicode characters, which in Python is str. To my knowledge, in Go, json.Unmarshal, when passed an object as the second argument, would automatically handle the conversion.

In the case of Python, I needed to convert from the str to bytes like this, using the built-in binascii module:

import binascii
uncompressed = zlib.decompress(binascii.a2b_base64(s))

I found this solution based on this answer.

Shane Bishop
  • 3,905
  • 4
  • 17
  • 47
  • So Base64 was the missing link/information. How did you figure that out now? – Kelly Bundy Sep 13 '22 at 22:29
  • To be honest, it was a guess. But I did verify that the data output by this Python code matched the data output by the Go code I was translating to Python. If I have time, I might look closer at the other team's Go code to try to puzzle out where the data is base64-encoded just to be extra certain my Python code is correct. Thanks for your help, your comments helped guide my thinking. – Shane Bishop Sep 13 '22 at 23:09