11

Context

I am currently attempting to build a small-scale bot using Selenium and Requests module in Python.
However, the webpage I want to interact with is running behind Cloudflare.
My python script is running over Tor using stem module.
My traffic analysis is based on Firefox's "Developer options->Network" using Persist Logs.

My findings so far:

  • Selenium's Firefox webdriver can often access the webpage without going through "checking browser page" (return code 503) and "captcha page" (return code 403).
  • Requests session object with the same user agent always results in "captcha page" (return code 403).

If Cloudflare was checking my Javascript functionality, shouldn't my requests module return 503 ?

Code Example

driver = webdriver.Firefox(firefox_profile=fp, options=fOptions)
driver.get("https://www.cloudflare.com")   # usually returns code 200 without verifying the browser

session = requests.Session()
# ... applied socks5 proxy for both http and https ... #
session.headers.update({"user-agent": driver.execute_script("return navigator.userAgent;")})
page = session.get("https://www.cloudflare.com")
print(page.status_code) # return code 403
print(page.text)        # returns "captcha page"

Both Selenium and Requests modules are using the same user agent and ip.
Both are using GET without any parameters.
How does Cloudflare distinguish these traffic?
Am I missing something?


I tried to transfer cookies from the webdriver to the requests session to see if a bypass is possible but had no luck.
Here is the used code:

for c in driver.get_cookies():
    session.cookies.set(c['name'], c['value'], domain=c['domain'])
ku8zi
  • 111
  • 1
  • 4
  • When using a web driver there are many things to consider including but not limited to; JavaScript APIs, HTTP headers, TLS headers, TCP fingerprint, IP fingerprint etc. When using a web driver such as Selenium - Cloudflare will mark you as "safer" than when the requests module. You will need to modify lots of parts of the request to be able to have a scaled solution. – GAP2002 Nov 25 '21 at 11:46

3 Answers3

1

There are additional JavaScript APIs exposed to the webpage when using Selenium. If you can disable them, you may be able to fix the problem.

9pfs
  • 560
  • 5
  • 17
1

Cloudflare doesn't only check HTTP headers or javascript — it also analyses the TLS header. I'm not sure exactly how it does it, but I've found that it can be circumvented by using NSS instead of OpenSSL (though it's not well integrated into Requests).

Kyuuhachi
  • 651
  • 6
  • 15
0

The captcha response depends on the browser fingerprint. It's not about just sending Cookies and User-agent.

Copy all the headers from Network Tab in Developers console, and send all the key value pairs as headers in request library.

This method should work logically.

Ashutosh Kumar
  • 459
  • 3
  • 12