Scrapy Twisted ConnectionLost error

Question

I am learning scrapy and am having a hard time trying to figure out this issue. My spider will not crawl the macys website and keeps throwing the following error:

[<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]

Things I've tried so far:

Setting headers and robotstxt obey per this thread: Scrapy Shell: twisted.internet.error.ConnectionLost although USER_AGENT is set
Changing the user agent per this thread: How to prevent a twisted.internet.error.ConnectionLost error when using Scrapy?
Cryptography <2 per this thread: Scrapy twisted connection lost in non-clean fashion. No proxy. Already tried headers
Monkeypatch: Twisted Python Failure - Scrapy Issues

I also checked scrapy shell "www.macys.com" into the command prompt and get the same error. So I'm guessing the issue is not with my spider. Could someone please help?

Can you still access the website in your navigator ? – Clément Denoix Nov 21 '17 at 04:09 — Clément Denoix, Nov 21 '17 at 04:09

score 1 · Answer 1 · answered Nov 21 '17 at 04:11

1

It seems that your IP from you are launching your scraper has been blacklisted.

You might want to read the following: https://doc.scrapy.org/en/latest/topics/practices.html#avoiding-getting-banned

Also, you might want to tune the settings concerning the number of requests outputted by scrapy: CONCURRENT_REQUESTS, DOWNLOAD_DELAY, etc.

answered Nov 21 '17 at 04:11

Clément Denoix

1,504
11
18

1

I commented out my USER_AGENT and the script worked. Any idea why that is? USER_AGENT = 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36' – user6055239 Nov 21 '17 at 20:37
Maybe TCP fingerprinting? – Raunaqss Jul 26 '21 at 06:56

Scrapy Twisted ConnectionLost error

1 Answers1

Linked