0

I am writing an ordinary site parser, but when I request it, it gives a 403 error, and if I bypass it, they throw me a captcha with a puzzle

Python code

import requests

headers = {

    "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko)\

     Chrome/108.0.0",

    "accept-language": "ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7",

    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,\

    */*;q=0.8,application/signed-exchange;v=b3;q=0.9"

}

page = requests.get("https://www.leboncoin.fr/", headers=headers)

print(page.text)

Conclusion that does not meet expectations:

<html>

<head><title>leboncoin.fr</title>

    <meta property="og:title" content="Rendez-vous sur leboncoin pour découvrir cette annonce !" />

<meta property="og:image" content="https://img.datadome.co/captcha/page-customization/1872/866d27bc-26b6-476e-b41d-496f3e0a7fb4.jpeg" />

    <style>#cmsg{animation: A 1.5s;}@keyframes A{0%{opacity:0;}99%{opacity:0;}100%{opacity:1;}}

</style>

</head>

<body style="margin:0">

    <p id="cmsg">Please enable JS and disable any ad blocker</p>

    <script data-cfasync="false">var dd={'cid':'AHrlqAAAAAMAK9_SOZqlD0wAubFoPg==','hsh':'05B30BD9055986BD2EE8F5A199D973','t':'bv','s':2089,'e':'a80d02f20f355f979f960376ec28757d18dea1c6d5954e9447e59b99794c3ef2','host':'geo.captcha-delivery.com'}</script><script data-cfasync="false" src="https://ct.captcha-delivery.com/c.js"></script>

</body>

</html>
Gunesh Shanbhag
  • 559
  • 5
  • 13
  • 403 means "forbidden". The site doesn't want you to do what you are trying to do. The captcha is there to keep out programmatic access. – BoarGules Dec 26 '22 at 07:48
  • What can be done to get around this? – Levon Avetisyan Dec 26 '22 at 08:27
  • Your error message says to enable Javascript. For sites that depend on Javascript running in a browser, the usual recommendation is to use `selenium` which runs a headless browser under program control. That will meet the Javascript requirement. But you may still run into permission errors, and if you try too hard to bypass them, the site may respond by banning your IP address – BoarGules Dec 27 '22 at 06:39
  • Have you figured out a way to scrape? Good luck, handled by Datadome anti-bot – Gilles Quénot May 23 '23 at 22:42

1 Answers1

0

You could try using Selenium, or at least javascript as mentioned in this answer.

Maybe your webiste detects, if a request is made by python, and not with a actual browser.

kaliiiiiiiii
  • 925
  • 1
  • 2
  • 21