65

I am trying to encode a dictionary containing a string of bytes with json, and getting a is not JSON serializable error:

import base64
import json

data = {}
encoded = base64.b64encode(b'data to be encoded')
data['bytes'] = encoded

print(json.dumps(data))

The error I get:

TypeError: b'ZGF0YSB0byBiZSBlbmNvZGVk\n' is not JSON serializable

How can I correctly encode my dictionary containing bytes with JSON?

cottontail
  • 10,268
  • 18
  • 50
  • 51
Fanta
  • 2,825
  • 5
  • 26
  • 35

2 Answers2

108

json.dumps() expects strings in its input. Since base64.b64encode() encodes bytes you need to convert those bytes into a string using the ASCII codec:

import base64

encoded = base64.b64encode(b'data to be encoded')  # b'ZGF0YSB0byBiZSBlbmNvZGVk' (notice the "b")
data['bytes'] = encoded.decode('ascii')            # 'ZGF0YSB0byBiZSBlbmNvZGVk'

Note that to get the original data back you don't need to re-encode it to bytes because b64decode() handles ASCII-only strings as well as bytes:

decoded = base64.b64decode(data['bytes'])  # b'data to be encoded'
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 4
    I know this is an old post but you meight want to use `UTF8` instead of `ASCII` – Jelle de Fries Oct 30 '20 at 10:53
  • 14
    @JelledeFries: No, because Base64 only ever uses ASCII characters. – Martijn Pieters Oct 30 '20 at 15:33
  • 3
    @MartijnPieters technically it's mute point. All ASCII characters are also UTF-8 characters *by design*. However Jelle de Fries has a point because UTF-8 is so ubiquitous that encoding to anything else ought to be done for good reason. I see no good reason to chose ASCII over UTF-8. – Philip Couling Jan 27 '21 at 22:35
  • 11
    @PhilipCouling: except that base64 only ever uses ASCII characters, and if code that is supposed to only produce ASCII codepoints is not in fact doing so, something is broken. I like having such safety valves in place, and being explicit about constraints. – Martijn Pieters Jan 30 '21 at 13:11
  • 1
    @MartijnPieters it would just be `encoded.decode()` because `'utf-8'` is the [default argument](https://docs.python.org/3/library/stdtypes.html#bytes.decode) to `.decode()`. It would be shorter and also anyone reading the code doesn't need to know about encodings or the difference between ASCII and UTF-8, they just need to know "it's bytes and you need to turn it into a string with `.decode()`". Personally, I wrote software for a couple years before I understood what UTF-8 was. I think it's just unnecessary explicitness, we know it's going to be ASCII, there's no need to error if its not. – Boris Verkhovskiy Jun 26 '23 at 00:14
  • @BorisVerkhovskiy: yes, ASCII is a subset of UTF8, but just using the default is a shortcut that'll trip you up if the default ever changed or the output of `base64.b64encode()` ever changed to cover more than ASCII (or the function was replaced by accident, etc.). You should **always** be thinking about what codec the bytes are in when decoding, by the way. – Martijn Pieters Jun 26 '23 at 18:40
  • None of those things will ever change and I shouldn't think about that at all, computers should just use one encoding - UTF-8 – Boris Verkhovskiy Jun 26 '23 at 22:14
  • 1
    @BorisVerkhovskiy: that may well be, but mistakes happen, and Python is highly maleable (you can replace functions dynamically). Better to catch those mistakes as early as possible. Note that I **explicitly talk about this** in my comment from January 2021, too. It's a safety valve, please don't remove safety valves just because it looks shorter. – Martijn Pieters Jun 28 '23 at 12:31
0

As @Martijn mentioned, only strings are json-serializable (using json). A bytes object can be converted into a string using a str() call by specifying the encoding.

data = {}
encoded = base64.b64encode(b'data to be encoded')
data['bytes'] = str(encoded, encoding='ascii')
#               ^^^          ^^^^^^^^^^^^^^^^
json.dumps(data)   # '{"bytes": "ZGF0YSB0byBiZSBlbmNvZGVk"}'
cottontail
  • 10,268
  • 18
  • 50
  • 51