I am writing an ordinary site parser, but when I request it, it gives a 403 error, and if I bypass it, they throw me a captcha with a puzzle
Python code
import requests
headers = {
"user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko)\
Chrome/108.0.0",
"accept-language": "ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,\
*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"
}
page = requests.get("https://www.leboncoin.fr/", headers=headers)
print(page.text)
Conclusion that does not meet expectations:
<html>
<head><title>leboncoin.fr</title>
<meta property="og:title" content="Rendez-vous sur leboncoin pour découvrir cette annonce !" />
<meta property="og:image" content="https://img.datadome.co/captcha/page-customization/1872/866d27bc-26b6-476e-b41d-496f3e0a7fb4.jpeg" />
<style>#cmsg{animation: A 1.5s;}@keyframes A{0%{opacity:0;}99%{opacity:0;}100%{opacity:1;}}
</style>
</head>
<body style="margin:0">
<p id="cmsg">Please enable JS and disable any ad blocker</p>
<script data-cfasync="false">var dd={'cid':'AHrlqAAAAAMAK9_SOZqlD0wAubFoPg==','hsh':'05B30BD9055986BD2EE8F5A199D973','t':'bv','s':2089,'e':'a80d02f20f355f979f960376ec28757d18dea1c6d5954e9447e59b99794c3ef2','host':'geo.captcha-delivery.com'}</script><script data-cfasync="false" src="https://ct.captcha-delivery.com/c.js"></script>
</body>
</html>