Scrapy and python: DNS lookup failed: no results for hostname lookup - proxy issue?

Question

I am trying to use Scrapy and Python to scrape some pages from within my company's IT and network. I started by using the scrapy tutorial from here https://doc.scrapy.org/en/latest/intro/tutorial.html

When I try the code identical to the one on the tutorials page, I get the error:

2018-01-24 11:49:04 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://quotes.toscrape.com/robots.txt> (failed 1 times): DNS lookup failed: no results for hostname lookup: quotes.toscrape.com.

Thus, I tried to set up my proxy server to get a connection, which I also have to do to use pip install (just as an example). I did this by changing the code of the tutorial, using Amom's approach from Scrapy and proxies:

import scrapy
class QuotesSpider(scrapy.Spider):
    name = "quotes"
    def start_requests(self):
        urls = [
            'http://quotes.toscrape.com/page/1/',
            'http://quotes.toscrape.com/page/2/',
        ]
        for url in urls:
            request = scrapy.Request(url=url, callback=self.parse)
            request.meta['proxy'] = "user@proxy:port"
            yield request

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
        f.write(response.body)
        self.log('Saved file %s' % filename)

Does somebody how on how to solve this? I really need to get this to work. Thanks in advance.

Can you first check in a normal browser and see if the proxy is actually working or not? — Tarun Lalwani, Jan 24 '18 at 11:47
I just tried, using the Internet Explorer, however, I am not allowed to change connection settings. — Cactus, Jan 24 '18 at 11:47
I can not test it using Firefox either, I am not allowed to change connection settings. I have tried using `pip --proxy ... install packagename`and that worked just fine. Do you have any idea what I can doß? — Cactus, Jan 24 '18 at 12:23
from your machine can you open http://quotes.toscrape.com/page/1 ?? — Gaur93, Jan 24 '18 at 17:22
Can you try with `request.meta['proxy'] = "http://user@proxy:port"` — Tarun Lalwani, Jan 25 '18 at 06:32
@TarunLalwani: No, this unfortunately does not work, however my proxy is 'https://user@proxy:port'. Mr. Gaur93: yes, this works. — Cactus, Jan 25 '18 at 12:44

score 0 · Answer 1 · answered Apr 06 '21 at 04:24

0

that means they are blocking scrapy i.e., they are not allowing anyone to scrape their website. I'm sorry, you can't do anything about it.

answered Apr 06 '21 at 04:24

veera sekhar

1

Scrapy and python: DNS lookup failed: no results for hostname lookup - proxy issue?

1 Answers1