0

I'm using python with undetected_chromedriver to try and crawl https://www.carrefour.fr/ but are being blocked by the Cloudlflare challenge.

The code below works fine when I run it locally on my mac but when tested in an Ubuntu container on AWS, it returns the CF captcha page.

Any ideas?

import undetected_chromedriver as uc
from fake_useragent import UserAgent

ua = UserAgent()

options = uc.ChromeOptions()
options.add_argument("--start-maximized")
options.add_argument("--no-sandbox")
options.add_argument('User-Agent={0}'.format(ua.chrome))

proxy = {'proxy': {'http': 'REMOVED', 'https': 'REMOVED'}}
url = "https://www.carrefour.fr/set-store/79?redirect=/r/jardin-amenagement-dexterieur/mobilier-jardin/chaises"
driver = uc.Chrome(version_main=113, options=options, headless=True, seleniumwire_options=proxy)
driver.get(url)
print(driver.page_source)
  • 2
    It seems likely that the site is adding a captcha to prevent automated / scalable crawling. – stdunbar May 23 '23 at 20:30
  • 1
    To add on to @stdunbar's comment - if there was a trivial way to bypass this mechanism, there would be no point to the webmaster adding it in the first place. It's obvious the site does not want you to be doing *exactly what you're trying to do*. – esqew May 23 '23 at 20:33
  • You might try using [Selenium-Profiles](https://github.com/kaliiiiiiiiii/Selenium-Profiles) or proxies – kaliiiiiiiii May 30 '23 at 07:18

0 Answers0