1

When I use Splash to scrape Trivago, I got Captcha. It doesn't happen if I use curl or using a normal Scrapy request.

Is there a way to use Splash but not get detected as a bot by Trivago?

Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108
  • 1
    Try chaning the user agent string that Splash sends to the server. See [`splash:set_user_agent`](http://splash.readthedocs.io/en/stable/scripting-ref.html#splash-set-user-agent) in the documentation. – Tomáš Linhart Feb 14 '18 at 06:48
  • it doesn't work. Still got detected. – Aminah Nuraini Feb 14 '18 at 21:10
  • Do you get some valid responses first and then start getting blocked? How fast are you crawling? Be polite and lower down the speed. – bosnjak Feb 14 '18 at 21:22
  • I tried to render the page myself using Splash console and didn't get any captcha. Probably you were scraping it in a way that got you marked as a robot. Either use a proxy or maybe you can try to solve the captcha programatically. What kind of captcha is it? – Tomáš Linhart Feb 15 '18 at 07:05

0 Answers0