2

i am new to python and especially to scrapy. I wanted to make a spider which gives me all the comments from a reddit page. it finds the comments but it does not save them to a .csv file. Here is my spider:

     import scrapy
     from scrapy.spiders import CrawlSpider, Rule
     from scrapy.loader import ItemLoader
     from reddit.items import RedditItem


     class TestSpider(CrawlSpider):
        name="test"
        allowed_domains = ["www.reddit.com"]
        start_urls =        ['https://www.reddit.com/r/FIFA/comments/7pulch/introduction_community_update/']

        def parse(self, response):

        selector_list = response.xpath('//div[contains(@data-type, "comment")]')

        for selector in selector_list:
            item = RedditItem()
            item['comment_text'] = selector.xpath('.//div[contains(@class, "usertext-body may-blank-within md-container ")]/div').extract()
            item['comment_author'] = selector.xpath('./@data-author').extract()
            item['comment_id'] = selector.xpath('./@id').extract()


            yield item

And this is one example for the error i get in every step:

    2018-03-01 13:10:23 [scrapy.core.scraper] ERROR: Error processing 
   {'comment_author': [u'Vision322'],
   'comment_id': [u'thing_t1_dsk7a5t'],
   'comment_text': [u'<div class="md"><p>hello and welcome!\nhow are     you?   </p>\n</div>']}
    Traceback (most recent call last):
    File       "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-  packages/twisted/internet/defer.py", line 653, in _runCallbacks
      current.result = callback(current.result, *args, **kw)
    File "/Users/Torben/reddit/reddit/pipelines.py", line 11, in        process_item
    item['title'] = ''.join(item['title']).upper()
    File     "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-   packages/scrapy/item.py", line 59, in __getitem__
    return self._values[key]
    KeyError: 'title'

Can anyone tell me what the problem is?

Torb
  • 259
  • 2
  • 12
  • Before automating your XPath commands, first try them out manually and see if this works. – Dominique Mar 01 '18 at 13:21
  • The error seems to be in your pipeline, which is trying to access `item['title']`, which you're not creating in your spider – stranac Mar 01 '18 at 13:27
  • Error says that there is no key `title` in the data dictionary (or associative array if you a PHP programmer) sent to pipelines, you need to send this from the spider code you posted in your question. – Umair Ayub Mar 01 '18 at 14:51
  • Thank you all, that helped! The problem was the pipeline where i needed another item for my spider. – Torb Mar 01 '18 at 18:18

0 Answers0