Can't bypass cloudflare with python cloudscraper

Question

I faced with cloudflare issue when I tried to parse the website.

I got this code

import cloudscraper

url = "https://author.today"
scraper = cloudscraper.create_scraper()
print(scraper.post(url).status_code)

This code prints me

cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.

I searched for workaround, but couldn't find any solution. If visit the website via a browser you could see

Checking your browser before accessing author.today.

Is there any solution to bypass cloudflare in my case?

`not available in the opensource (free) version` - so pay for this. — furas, Jan 07 '21 at 00:32
There is apparently, "no paid version". However the docs states: ```Cloudflare modifies their anti-bot protection page occasionally, So far it has changed maybe once per year on average. If you notice that the anti-bot page has changed, or if this module suddenly stops working, please create a GitHub issue so that I can update the code accordingly.```. It stopped working for me too suddenly, so I assume they changed strategy — Bastien Bastien, Jan 07 '21 at 15:27
Interestingly though, even when I copy the chrome request and resend it (with all cookies) from curl, using the same IP, it doesn't seem to fool CloudFlare. I wonder why that is and how would cloudflare differentiate my browser from cURL, when they both make the same request. (nb, that method of copying the request headers, used to work... not anymore though...) — Bastien Bastien, Jan 07 '21 at 15:30
The exception indeed contains a hint. But I didn't find any not free version. — Nickolas, Jan 08 '21 at 00:24
I'm scrapping 670 pages, the code works well till page 100 and then throws this exception. Did any of you guys find any solution or an alternate method? @shawnngtq Nabi K.A.Z. nickolas — Madhur Yadav, Jul 04 '21 at 01:50
No, I didn't find any solution yet. If somebody find it, please let me know @MadhurYadav — Nickolas, Jul 05 '21 at 09:28
@MadhurYadav In your case, maybe you could just scrape 100 pages, wait 10, 20, 30 (who knows?) minutes or so, then scrape another 100 pages, etc. By the way, there is no paid version of cloudscraper— it's just really hard to keep up with Cloudflare strategies. — Tommy A., Nov 03 '21 at 23:22
@BastienBastien they do, among other things, SSL handshake fingerprinting. And Chrome use BoringSSL as library. — Paolo, Feb 21 '22 at 14:12
@Paolo it seems that the modern viable solution is now to use selenium, just like FlarSolverr does: https://github.com/FlareSolverr/FlareSolverr — Bastien Bastien, Feb 21 '22 at 14:34

Zorome · Answer 1 · 2022-12-23T19:05:04.643

3

Install httpx

pip3 install httpx[http2]

Define http2 client

client = httpx.Client(http2=True)

Make request

response = client.get("https://author.today")

Cheers!

edited Dec 23 '22 at 19:05

answered Aug 22 '22 at 11:45

Zorome

106
8

1

This is the only solution that works for me and I tried everything there is. Thanks! – Karol Aug 22 '22 at 12:55
I get `TypeError: object Response can't be used in 'await' expression`. Do you have a MWE? Thanks – WitheShadow Sep 06 '22 at 15:39
@WitheShadow try without `await`. It is used for asynchronous programming – Smart Manoj Nov 01 '22 at 09:16
Needs to be `httpx.AsyncClient` instead of `httpx.Client`? – Eric Carmichael Dec 06 '22 at 21:53
Doesn't work, only results in `403 forbidden` and all you get it the page with "Checking if the site connection is secure". – Kryomaani Jan 08 '23 at 17:40
Try to change your IP Adress or Header params – Zorome Jan 09 '23 at 12:17

score 1 · Answer 2 · answered Oct 24 '21 at 21:29

Although for this site is does not seem to work, sometimes adding some parameters when initializing the scraper helps:

import cloudscraper

url = "https://author.today"
scraper = cloudscraper.create_scraper(
    browser={
        'browser': 'chrome',
        'platform': 'android',
        'desktop': False
    }
)
print(scraper.post(url).status_code)

score 0 · Answer 3 · answered Oct 05 '22 at 07:57

I'd try to create a Playwright scraper that mimics a real user, this works for me most of the time, just need to find the right settings (they can vary from website to website). Otherwise, if the website has a native App, try to figure out how the App behaves and then mimic it.

score 0 · Answer 4 · answered Feb 01 '23 at 15:51

I can suggest such workflow to "try" to avoid Cloudflare WAF/bot mitigation:

don't cycle user agents, proxies or weird tunnels to surf
don't use fixed ip addresses, better leased lines like xDSL, home links and 4G/LTE
try to appear as mobile instead of a desktop/tablet
try to reproduce pointer movements like never before AKA record your mouse moves and migrate them 1:1 while scraping (yes u need JS enabled and some headless browser able to make up as "common" one)
don't cycle against different Cloudflare protected entities otherwise the attacker ip will be greylisted in a minute (AKA build your own targets blacklist, never touch such entities or you will go in the CF blacklist flawlessy)
try to reproduce a real life navigation in all aspects, including errors, waitings and more
check your used ip after any scrape against popular blacklists otherwise bad errors will shortly appears (crowdsec is a good starting point)
the usual scrape is a googlebot scrape, a single regex WAF rule on CLoudflare will block 99,99% of the tries then.. avoid to fake as google and try to be LESS evil instead (ex: asking webmasters for APIs or data export if any).

Source: I use Cloudflare with hundreds of domains and thousands of records (Enterprise) from the beginning of the company.

That way you will be closer to the point (and you will help them increasing the overall security).

It would really help to have a short example.... – Jason Hudson Jun 05 '23 at 00:51 — Jason Hudson, Jun 05 '23 at 00:51

score -1 · Answer 5 · edited Jan 23 '22 at 20:28

-1

import cfscrape
from fake_useragent import UserAgent
ua = UserAgent()

s = cfscrape.create_scraper()

k = s.post("https://author.today", headers = {"useragent": f"{ua.random}"})
print(k)

edited Jan 23 '22 at 20:28

ZygD

22,092
39
79
102

answered Jan 23 '22 at 14:56

Hello

7
1

4

Would be great if you could add an explanation as it makes answers way more helpful. Also, there's markup available for Code formatting. – Jonathan Jan 23 '22 at 17:55
March 2022 - doesn't seem to work, more examples here https://pypi.org/project/cfscrape/ – user7660047 Mar 11 '22 at 10:26
3

The `cfscrape` project has not been maintained for years now and no longer works. Did you actually try this? – Martijn Pieters Apr 11 '22 at 15:10

score -3 · Answer 6 · answered Sep 03 '22 at 13:39

-3

I used this line: scraper = cloudscraper.create_scraper(browser={'browser': 'chrome','platform': 'windows','mobile': False})

and then used httpx package after that with httpx.Client() as s: //Remaining Code

And I was able to bypass the issue cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.

answered Sep 03 '22 at 13:39

user2284144

95
1
3

2

"and then used httpx package after that with httpx.Client() as s: //Remaining Code" Could you please elaborate on what exactly did you do with it and what is the "remaining code"? As is, this answer is unusable. – Kryomaani Jan 08 '23 at 17:45

Can't bypass cloudflare with python cloudscraper

6 Answers6