I am trying to use Scrapy and Python to scrape some pages from within my company's IT and network. I started by using the scrapy tutorial from here https://doc.scrapy.org/en/latest/intro/tutorial.html
When I try the code identical to the one on the tutorials page, I get the error:
2018-01-24 11:49:04 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://quotes.toscrape.com/robots.txt> (failed 1 times): DNS lookup failed: no results for hostname lookup: quotes.toscrape.com.
Thus, I tried to set up my proxy server to get a connection, which I also have to do to use pip install (just as an example). I did this by changing the code of the tutorial, using Amom's approach from Scrapy and proxies:
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
request = scrapy.Request(url=url, callback=self.parse)
request.meta['proxy'] = "user@proxy:port"
yield request
def parse(self, response):
page = response.url.split("/")[-2]
filename = 'quotes-%s.html' % page
with open(filename, 'wb') as f:
f.write(response.body)
self.log('Saved file %s' % filename)
Does somebody how on how to solve this? I really need to get this to work. Thanks in advance.