0

i received this error after running my spider i also have a pipeline and i convert everything into JSON but still got this error after my item return

TypeError: Object of type 'bytes' is not JSON serializable

my code is


    import json
    import re
    import types

    SEPARATOR = '-'
    FILING_PROPERTIES = ['state_id', 'types', 'description', 'filing_parties', 'filed_on']
    DOCUMENT_PROPERTIES = ['types', 'title', 'blob_name', 'state_id', 'source_url']


    class AeeiPipeline(object):
        def process_item(self, item, spider):
            import pdb
            #
            if item.get('title', None):
                item['source_title'], item['title'] = self.title_case(item['title'])
            if item.get('description'):
                pdb.set_trace()
                item['description'] = self.title_case(item['description'])
            for filing in item.get("filings", []):
                if filing.get('description'):
                    pdb.set_trace()
                    filing['description'] = self.title_case(filing['description'])
                for _key in ["filing_parties", "types"]:
                    if not (_key in filing and filing[_key]):
                        filing[_key] = []
                    elif isinstance(filing[_key], str):
                        filing[_key] = [filing[_key]]

                for doc in filing.get("documents", []):
                    if doc.get('name'):
                        doc['name'] = doc['name']
                    if doc.get('title'):
                        doc['title'] = self.make_unicode(doc['title'])
                    if "types" in doc and not type(doc["types"]) is list:
                        doc["types"] = [doc["types"]]
            for _key in ["industries", "assignees", "major_parties", "source_assignees", "source_major_parties"]:
                if not (_key in item and item[_key]):
                    item[_key] = []
                elif isinstance(item[_key], str):
                    item[_key] = [item[_key]]

            for key, value in item.items():
                if type(item[key]) is str:
                    item[key] = value.strip()
            pdb.set_trace()
            item = json.dumps(item) + '\n'
            return item

        def title_case(self, title):
            title = self.make_unicode(title)
            return title, re.sub(u"[A-Za-z]+(('|\u2019)[A-Za-z]+)?",
                                 lambda mo: mo.group(0)[0].upper() + mo.group(0)[1:].lower(),
                                 title)
CodeRed
  • 905
  • 1
  • 6
  • 24
shahrukh ijaz
  • 117
  • 3
  • 10
  • It means you have a `bytes` field in your dict, you have few options, build you own JSON Encoder, or simply cast to `str`. Does `json.dumps(item, default=str)` works? – jlandercy Sep 20 '19 at 07:02
  • i did this ```item = json.dumps(item) + '\n'``` but got the error **TypeError: Object of type 'PucItem' is not JSON serializable** – shahrukh ijaz Sep 20 '19 at 07:03
  • 1
    Please read [mcve] to learn how to write a good question. My advice, avoid adding characters to a json string by yourself, this is the job of JSON Encoder (which is called by `dumps` method). Without having the full traceback (copy paste, no screenshot) of your error it is difficult to know. Have you tried `default=str` without adding `\n` at the end as suggested in my previous comment? – jlandercy Sep 20 '19 at 11:11

1 Answers1

1
TypeError: Object of type 'PucItem' is not JSON serializable

This means you are using Scrapy's Item class

Solution is that either do this

item = json.dumps(dict(item))

Or in your Spider, do NOT use Item class to create item, just use a Dict like item = {}

Umair Ayub
  • 19,358
  • 14
  • 72
  • 146