1

I made a pipeline to put scrapy data to my Parse Backend

PARSE = 'api.parse.com' PORT = 443

However, I can't find the right way to post the data in Parse. Because everytime it creates undefined objects in my Parse DB.

 class Newscrawlbotv01Pipeline(object):
    def process_item(self, item, spider):
        for data in item:
            if not data:
                raise DropItem("Missing data!")
        connection = httplib.HTTPSConnection(
            settings['PARSE'],
            settings['PORT']
        )
        connection.connect()
        connection.request('POST', '/1/classes/articlulos', json.dumps({item}), {
       "X-Parse-Application-Id": "XXXXXXXXXXXXXXXX",
       "X-Parse-REST-API-Key": "XXXXXXXXXXXXXXXXXXX",
       "Content-Type": "application/json"
     })
        log.msg("Question added to PARSE !", level=log.DEBUG, spider=spider)
        return item

Example of an error :

TypeError: set([{'image': 'http://apps.site.lefigaro.fr/sites/apps/files/styles/large/public/thumbnails/image/sport24.png?itok=caKsKUzV',
 'language': 'FR',
 'publishedDate': datetime.datetime(2016, 3, 16, 21, 53, 10, 289000),
 'publisher': 'Le Figaro Sport',
 'theme': 'Sport',
 'title': u'Pogba aurait rencontr\xe9 les dirigeants du PSG',
 'url': u'sport24.lefigaro.fr/football/ligue-des-champions/fil-info/prolongation-entre-le-bayern-et-la-juve-796778'}]) is not JSON serializable
  • your log says `connection.request('POST', '/1/classes/articlulos', json.dumps({data}), {` while your code `connection.request('POST', '/1/classes/articlulos', json.dumps({item}), {`, are you giving bad examples? – eLRuLL Mar 16 '16 at 20:48
  • Oh yes sorry it was the error when connection.request('POST', '/1/classes/articlulos', json.dumps({item}), { was ({data}) – Thomas Simonini Mar 16 '16 at 21:26
  • could you update your question? (edit it) – eLRuLL Mar 16 '16 at 21:27

2 Answers2

0

Looks like you have a set inside item['data'], which isn't accepted on JSON.

You need to change that field back to list before trying to make it JSON acceptable.

eLRuLL
  • 18,488
  • 9
  • 73
  • 99
  • don't surround `item` with `{}`, just pass the item – eLRuLL Mar 16 '16 at 22:06
  • I passed the item json.dumps(item) but it's the same error: TypeError: {'image': 'http://apps.site.lefigaro.fr/sites/apps/files/styles/large/public/thumbnails/image/sport24.png?itok=caKsKUzV', 'language': 'FR', 'publishedDate': datetime.datetime(2016, 3, 16, 22, 10, 43, 146000), 'publisher': 'Le Figaro Sport', 'theme': 'Sport', 'title': u'Une affiche du Bayern choque la communaut\xe9 juive italienne', 'url': u'sport24.lefigaro.fr/tennis/atp/fil-info/gasquet-je-sais-que-mon-meilleur-niveau-reviendra-796780'} is not JSON serializable – Thomas Simonini Mar 16 '16 at 22:11
  • same error, `datetime` is not serializable, you'll have to format it like a string – eLRuLL Mar 16 '16 at 23:06
0

I found the solution

class Newscrawlbotv01Pipeline(object):
def process_item(self, item, spider):
    for data in item:
        if not data:
            raise DropItem("Missing data!")
    connection = httplib.HTTPSConnection(
        settings['PARSE'],
        settings['PORT']
    )

    connection.connect()
    connection.request('POST', '/1/classes/Articles', json.dumps(dict(item)), {
   "X-Parse-Application-Id": "WW",
   "X-Parse-REST-API-Key": "WW",
   "Content-Type": "application/json"
 })
    log.msg("Question added to PARSE !", level=log.DEBUG, spider=spider)
    return item
    #self.collection.update({'url': item['url']}, dict(item), upsert=True)