How to fix UnicodeDecodeError for bytes from requests?

Question

I have the following full working example code using selenium-wire to record all requests made.

import os
import sys
import json
from seleniumwire import webdriver

driver = webdriver.Chrome()

driver.get("http://www.google.com")

list_requests = []
for request in driver.requests:
    req = {
        "method": request.method,
        "url": request.url,
        "body": request.body.decode(), # to avoid json error
        "headers": {k:str(v) for k,v in request.headers.__dict__.items()} # to avoid json error
    }
  
    if request.response:
        resp = {
            "status_code": request.response.status_code,
            "reason": request.response.reason,
            "body": request.response.body.decode(), # ???
            "headers": {k:str(v) for k,v in request.response.headers.__dict__.items()} # to avoid json error
        }
        req["response"] = resp
    list_requests.append(req)

with open(f"test.json", "w") as outfile:
    json.dump(list_requests, outfile)

However, the decoding of the response body creates an error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 1: invalid start byte

and without trying to decode the response body I get an error

TypeError: Object of type bytes is not JSON serializable

I do not care about the encoding, I just want to be able to write the 'body' to the json file in some way. If needed the byte/character in question can be removed, I do not care.

Any ideas how this problem can be solved?

if you're getting bytes from a web request, you may just want to ignore them. Chances are that is a file of some kind (upload or download?) so trying to translate that into JSON data is not very useful. — pcalkins, Feb 17 '22 at 21:28
If these are text files being transferred they may or may not have a BOM (byte-order-marker) which might tell you what encoding is being used. — pcalkins, Feb 17 '22 at 21:31

score 1 · Answer 1 · answered Feb 18 '22 at 08:03

I've used the next approach in order to extract some field (some_key) from json response:

from gzip import decompress
import json

some_key = None

for request in driver.requests:
    if request.response:
        if request.method == 'POST':
            print(request.method + ' ' + request.url)
            try:
                # try to parse the json response to extract the data
                data = json.loads(request.response.body)
                print('parsed as json')
                if 'some_key' in data:
                    some_key = data['some_key']
            except UnicodeDecodeError:
                try:
                    # decompress on UnicodeDecodeError and parse the json response to extract the data
                    data = json.loads(decompress(request.response.body))
                    print('decompressed and parsed as json')
                    if 'some_key' in data:
                        some_key = data['some_key']
                except json.decoder.JSONDecodeError:
                    data = request.response.body
                    print('decompressed and not parsed')
print(data)
print(some_key)

gzip.decompress helped me with UnicodeDecodeError.

Hope this will be helpful.

score 1 · Answer 2 · answered May 04 '23 at 07:38

Chrome may send request that you don't expect. It could ruin the response analysis if you don't handle them. Those are GET request like thoses :

GET https://r3---sn-n4g-gon6.gvt1.com/edgedl/chrome/dict/en-us-10-1.bdic?cms_redirect=yes&mh=7g&mip=XXX.XXX.XXX.XXX&mm=28&mn=sn-n4g-gon6&ms=nvh&mt=1683454352068&mv=u&mvi=3&pl=24&rmhost=r4---sn-n4g-gon6.gvt1.com&shardbypass=sd
GET https://content-autofill.googleapis.com/v1/pages/ChVDaHJvbWUvMTExLjAuNTU2My4xMTASnwMJD36cJ79vbM...

So filtering POST method and domain could do the trick :

for request in driver.requests:
    if request.response:
        if request.method == 'POST' AND request.url.split('/')[3] == 'mydomain.com':
            data = json.loads(request.response.body)

How to fix UnicodeDecodeError for bytes from requests?

2 Answers2