-1

How to convert that bytes type array to str or json? I have this python byte-code and I need to convert to json format or string format. How can I do that?

b'x\xda\x04\xc0\xb1\r\xc4 \x0c\x85\xe1]\xfe\x9a\x06\xae\xf36\'B\x11\xc9J$?\xbbB\xec\x9eo\xb3"\xde\xc0\x9ero\xc4Ryb\x1b\xe5?K\x18\xaa9\x97\xc4i\xdc\x17\xd6\xc7\xaf\x8f\xf3\x05\x00\x00\xff\xff l\x12l'
Bitcoin Earn
  • 23
  • 1
  • 5

3 Answers3

4

This looks like random binary data, not encoded text, so one way of storing binary data in JSON is to use base64 encoding. The base64 algorithm ensures all the data elements are printable ASCII characters, but the result is still a bytes object, so .decode('ascii') is used to convert the ASCII bytes to a Unicode str of ASCII characters suitable for use in an object targeted for JSON use.

Example:

import base64
import json

data = b'x\xda\x04\xc0\xb1\r\xc4 \x0c\x85\xe1]\xfe\x9a\x06\xae\xf36\'B\x11\xc9J$?\xbbB\xec\x9eo\xb3"\xde\xc0\x9ero\xc4Ryb\x1b\xe5?K\x18\xaa9\x97\xc4i\xdc\x17\xd6\xc7\xaf\x8f\xf3\x05\x00\x00\xff\xff l\x12l'

j = {'data':base64.b64encode(data).decode('ascii')}
s = json.dumps(j)
print(s) # resulting JSON text

# restore back to binary data
j2 = json.loads(s)
data2 = base64.b64decode(j2['data'])
print(data2 == data)

Output:

{"data": "eNoEwLENxCAMheFd/poGrvM2J0IRyUokP7tC7J5vsyLewJ5yb8RSeWIb5T9LGKo5l8Rp3BfWx6+P8wUAAP//IGwSbA=="}
True

Simpler, but a longer result, is to use data.hex() to get a hexadecimal string representation and bytes.fromhex() to convert that back to bytes:

>>> s = data.hex()
>>> s
'78da04c0b10dc4200c85e15dfe9a06aef336274211c94a243fbb42ec9e6fb322dec09e726fc45279621be53f4b18aa3997c469dc17d6c7af8ff3050000ffff206c126c'
>>> data2 = bytes.fromhex(s)
>>> data2
b'x\xda\x04\xc0\xb1\r\xc4 \x0c\x85\xe1]\xfe\x9a\x06\xae\xf36\'B\x11\xc9J$?\xbbB\xec\x9eo\xb3"\xde\xc0\x9ero\xc4Ryb\x1b\xe5?K\x18\xaa9\x97\xc4i\xdc\x17\xd6\xc7\xaf\x8f\xf3\x05\x00\x00\xff\xff l\x12l'
>>> data2 == data
True
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
1

use the decode() method of the bytes object and provide the used encoding as a argument

Dennis
  • 71
  • 5
0

You don't have to convert the binary data using the base64 encoding algorithm nor into a hexadecimal string as @Mark Tolonen suggests in his answer — both of which require substantially more space to represent the data than the original.

Instead you can take advantage of the fact that JSON strings are "a sequence of zero or more Unicode characters" (per the JSON spec) which means different encoding are supported. This means you can "decode" the binary data into latin1 and the "encode" it back to the original binary data.

Here's what I mean:

import json

data = b'x\xda\x04\xc0\xb1\r\xc4 \x0c\x85\xe1]\xfe\x9a\x06\xae\xf36\'B\x11\xc9J$?\xbbB\xec\x9eo\xb3"\xde\xc0\x9ero\xc4Ryb\x1b\xe5?K\x18\xaa9\x97\xc4i\xdc\x17\xd6\xc7\xaf\x8f\xf3\x05\x00\x00\xff\xff l\x12l'

j = {'data': data.decode('latin1')}
s = json.dumps(j)
print(s) # resulting JSON text

# restore back to binary data
j2 = json.loads(s)
data2 = j2['data'].encode('latin1')
assert data2 == data  # Should be identical.

Here's the difference it makes for your sample data:

import base64

print(f"{len(data)}")                                    # -> 67
print(f"{len(data.decode('latin1'))}")                   # -> 67 
print(f"{len(base64.b64encode(data).decode('ascii'))}")  # -> 92 
print(f"{len(data.hex())}")                              # -> 134

✶ Note that I learned about the encoding trick from an answer by @Sven Marnach to a question about serializing binary data long ago (and have used multiple times since).

martineau
  • 119,623
  • 25
  • 170
  • 301
  • Look at the data once you write it to a JSON though. Even with `ensure_ascii=False` bytes like 0x00 become `'\\u0000'`. If there is lots of control bytes 0x00-0x1f in the data it still gets rather large. And if written with the standard UTF_8 encoding the > 0x7F code points double as well. Base64 gets a rather consistent 33% bigger. – Mark Tolonen Jul 16 '22 at 20:35
  • Case in point: `len(json.dumps(bytes(range(0x20)).decode('latin1'),ensure_ascii=False))`. 32 bytes becomes 174 code points. – Mark Tolonen Jul 16 '22 at 20:51
  • @Mark: Yes, your mileage will vary depending on the data involved. – martineau Jul 16 '22 at 20:53
  • Even with the sample data, dump it to JSON: `len(json.dumps(data.decode('latin1'),ensure_ascii=False))` -> 122 bytes. – Mark Tolonen Jul 16 '22 at 20:57
  • @Mark: That last example was an apples to oranges comparison. Your point about it depending on the data has been made. – martineau Jul 16 '22 at 21:00
  • oranges to oranges then, just to complete the point: `len(json.dumps(base64.b64encode(data).decode('ascii'),ensure_ascii=False))` -> 94 – Mark Tolonen Jul 16 '22 at 21:04
  • 1
    @Mark: For the record. note that the `base64` module also supports Adobe [Ascii85](https://en.wikipedia.org/wiki/Ascii85) (aka Base85) encoding via `a85encode()` which is better than the `b64encode` in the sense of having a smaller percentage increase. – martineau Jul 16 '22 at 21:13