0

I am trying to write data I 'scraped' from a site to a json output file with the following code:

from bs4 import BeautifulSoup
import requests
import json

path = ["https://www.test.be?page=,https://www.test2.be?page="]

adresArr = []
for i in path:
    pagina = 0;
    for x in range(0, 4):
        url = i + str(pagina)
        response = requests.get(url, timeout=5)
        content = BeautifulSoup(response.content, "html.parser")
        for adres in content.findAll('tr', attrs={"class": "odd clickable-row"}):
            adresObject = {
                "postcode": adres.find('td', attrs={"class": "views-field views-field-field-locatie-postal-code"}).text.encode('utf-8'),
                "naam": adres.find('td', attrs={"class": "views-field views-field-field-locatie-thoroughfare"}).text.encode('utf-8'),
                "plaats": adres.find('td', attrs={"class": "views-field views-field-field-locatie-locality"}).text.encode('utf-8')
            }
            adresArr.append(adresObject)


        for adres in content.findAll('tr', attrs={"class": "odd clickable-row active"}):
            adresObject = {
                "postcode": adres.find('td', attrs={"class": "views-field views-field-field-locatie-postal-code"}).text.encode('utf-8'),
                "naam": adres.find('td', attrs={"class": "views-field views-field-field-locatie-thoroughfare"}).text.encode('utf-8'),
                "plaats": adres.find('td', attrs={"class": "views-field views-field-field-locatie-locality"}).text.encode('utf-8')
            }
            adresArr.append(adresObject)

            pagina = x

    with open('adresData.json', 'w') as outfile:
         json.dump(adresArr, outfile)

I am getting the following error: object of type bytes is not json serializable

If I print the array itself, it looks OK. But i'm stuck at writing it to a json file. What am I doing wrong ?

It's my first time coding in python (and not alot of coding experience) So please make your answer clear to understand :)

Thanks in advance

2 Answers2

1

To resolve this Problem... You just have to convert data-type of your element here is a reference of the previously answered same question

TypeError: Object of type 'bytes' is not JSON serializable

this would might help

0

In the lines like this:

"postcode": adres.find('td', attrs={"class": "views-field views-field-field-locatie-postal-code"}).text.encode('utf-8')

The .text result should already be a string; .encode('utf-8') makes it the bytes object that the json library is complaining about. So just leave that off: adres.find('td', attrs={"class": "views-field views-field-field-locatie-postal-code"}).text.

Background info: bytes are the raw units of information; strings are how we represent text. We encode a string to make the bytes that are used for storage; we decode bytes to get a string back. But JSON is already designed to work with strings - the library will handle the file encoding for you when it actually writes to the disk.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153