1

I am trying to use Scrapy and Python to scrape some pages from within my company's IT and network. I started by using the scrapy tutorial from here https://doc.scrapy.org/en/latest/intro/tutorial.html

When I try the code identical to the one on the tutorials page, I get the error:

2018-01-24 11:49:04 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://quotes.toscrape.com/robots.txt> (failed 1 times): DNS lookup failed: no results for hostname lookup: quotes.toscrape.com.

Thus, I tried to set up my proxy server to get a connection, which I also have to do to use pip install (just as an example). I did this by changing the code of the tutorial, using Amom's approach from Scrapy and proxies:

import scrapy
class QuotesSpider(scrapy.Spider):
    name = "quotes"
    def start_requests(self):
        urls = [
            'http://quotes.toscrape.com/page/1/',
            'http://quotes.toscrape.com/page/2/',
        ]
        for url in urls:
            request = scrapy.Request(url=url, callback=self.parse)
            request.meta['proxy'] = "user@proxy:port"
            yield request

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
        f.write(response.body)
        self.log('Saved file %s' % filename)

Does somebody how on how to solve this? I really need to get this to work. Thanks in advance.

Cactus
  • 864
  • 1
  • 17
  • 44

1 Answers1

0

that means they are blocking scrapy i.e., they are not allowing anyone to scrape their website. I'm sorry, you can't do anything about it.