0

I'm using Scrapy 2.3 with the library scrapy_fake_useragents to scrape a major e-commerce website. When I run the spider on my local computer, scrapy will rotate user agents per the library and will scrape the information I need, bypassing the website's attempts to block Scrapy.

When I run Scrapy from my Amazon EC2 server, Scrapy recognizes the DownloaderMiddleware of the scrapy_fake_useragents library. However, it gets blocked.

Is there some difference between a local computer and a remote server that I don't know?

  • the `e-commerce website` could and SHOULD have implementations which block such behavior. They are on to you :) – Ron Aug 30 '20 at 04:28
  • Why does it then work on my local computer? Do they recognize the remote server when it is used? Thanks for the quick response :) – Dayne Tran Aug 30 '20 at 04:37
  • I do not think anyone here can give you correct answer, as we do not know the setup of that website, or your environments. There is just so little actual info, and so much room for guesses... we need much more details, I think, so that we can pinpoint the actual real reason. – Ron Aug 30 '20 at 06:13
  • Time to buy proxies, and use them. – Umair Ayub Aug 30 '20 at 12:45
  • Some antibot services can tell whether the source IP address comes from a home (residential IP) or a server (datacenter IP). – Gallaecio Sep 01 '20 at 10:04

0 Answers0