i am new to python and especially to scrapy. I wanted to make a spider which gives me all the comments from a reddit page. it finds the comments but it does not save them to a .csv file. Here is my spider:
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.loader import ItemLoader
from reddit.items import RedditItem
class TestSpider(CrawlSpider):
name="test"
allowed_domains = ["www.reddit.com"]
start_urls = ['https://www.reddit.com/r/FIFA/comments/7pulch/introduction_community_update/']
def parse(self, response):
selector_list = response.xpath('//div[contains(@data-type, "comment")]')
for selector in selector_list:
item = RedditItem()
item['comment_text'] = selector.xpath('.//div[contains(@class, "usertext-body may-blank-within md-container ")]/div').extract()
item['comment_author'] = selector.xpath('./@data-author').extract()
item['comment_id'] = selector.xpath('./@id').extract()
yield item
And this is one example for the error i get in every step:
2018-03-01 13:10:23 [scrapy.core.scraper] ERROR: Error processing
{'comment_author': [u'Vision322'],
'comment_id': [u'thing_t1_dsk7a5t'],
'comment_text': [u'<div class="md"><p>hello and welcome!\nhow are you? </p>\n</div>']}
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site- packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/Torben/reddit/reddit/pipelines.py", line 11, in process_item
item['title'] = ''.join(item['title']).upper()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site- packages/scrapy/item.py", line 59, in __getitem__
return self._values[key]
KeyError: 'title'
Can anyone tell me what the problem is?