I use a spider to crwal many websites from a list. I works as I need but now I additionally want to get the connection status. When running the spider I see some 404, some 301 or some DNS errors.
How can I get the connection status into my csv?
import scrapy
class CmsSpider(scrapy.Spider):
name = 'myspider'
f = open("random.csv")
start_urls = [url.strip() for url in f.readlines()]
f.close()
def parse(self, response):
title = response.xpath('//title/text()').extract_first()
url = response.request.url
description = response.xpath('//meta[@name="description"]/@content').extract_first()
yield {'URL': url, 'Page Title': title, 'Description': description}