4

Trying to convert output into Json format but getting the error. After removing the json.dump getting the data into base64 format. But when using json.dump it shows error.

Code:

import json 
import base64

with open(r"C:/Users/Documents/pdf2txt/outputImage.jpg","rb") as img:
    image = base64.b64encode(img.read())
    data['ProcessedImage'] = image

print(json.dump(data)

Output:

TypeError: Object of type 'bytes' is not JSON serializable

When using:

print(json.dumps(dict(data)))

It's also showing the same error

martineau
  • 119,623
  • 25
  • 170
  • 301
NKJ
  • 457
  • 1
  • 4
  • 11
  • Ensure to post *valid* code which results in the behavior described. The code shown will fail to parse / run, for at least two separate reasons. Without *valid* code which reproduces the issue described, one sometimes correct hypothesis, is that the actual problematic code and shown code differ. – user2864740 Oct 17 '20 at 17:37
  • You're getting the same error because it has nothing to do with what you are using in the `print` function call — it's from the `image = base64.b64encode(img.read())` line. – martineau Apr 20 '21 at 14:24

3 Answers3

6

You have to use the str.decode() method.

You are trying to serialize a object of type bytes to a JSON object. There is no such thing in the JSON schema. So you have to convert the bytes to a String first.

Also you should use json.dumps() instead of json.dump() because you dont want to write to a File.

In your example:

import json 
import base64

with open(r"C:/Users/Documents/pdf2txt/outputImage.jpg", "rb") as img:
    image = base64.b64encode(img.read())
    data['ProcessedImage'] = image.decode() # not just image

print(json.dumps(data))
Kumpelinus
  • 640
  • 3
  • 12
3

First of all, I think you should use json.dumps() because you're calling json.dump() with the incorrect arguments and it doesn't return anything to print.

Secondly, as the error message indicates, you can't serializable objects of type bytes which is what json.dumps() expects. To do this properly you need to decode the binary data into a Python string with some encoding. To preserve the data properly, you should use latin1 encoding because arbitrary binary strings are valid latin1 which can always be decoded to Unicode and then encoded back to the original string again (as pointed out in this answer by Sven Marnach).

Here's your code showing how to do that (plus corrections for the other not-directly-related problems it had):

import json
import base64

image_path = "C:/Users/Documents/pdf2txt/outputImage.jpg"
data = {}

with open(image_path, "rb") as img:
    image = base64.b64encode(img.read()).decode('latin1')
    data['ProcessedImage'] = image

print(json.dumps(data))
martineau
  • 119,623
  • 25
  • 170
  • 301
  • Still having the same issue getting the print output in the form of bs64 encoded , as i am new to this can you please show me how to get the real output as how to decode bs64 to get the real data . Thanks in advance . – NKJ Oct 18 '20 at 17:50
  • Not sure exactly what you're asking. You can undo the decoding of the value and get the `bytes` back with `image.encode('latin1')`. – martineau Oct 18 '20 at 17:59
2

image (or anythong returned by base64.b64encode) is a binary bytes object, not a string. JSON cannot deal with binary data. You must decode the image data if you want to serialize it:

data['ProcessedImage'] = image.decode()
DYZ
  • 55,249
  • 10
  • 64
  • 93
  • How to get the real data from bs64 encoded value , how i can decode it to get the actual value . – NKJ Oct 18 '20 at 17:51