10

so I'm trying to bypass the cloudflare protection of a website to scrape some items from them but the Cloudscraper python module is not working.

Whenever I run it, I receive this error:

cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.

Here is a simplified code I'm using:

import cloudscraper
from bs4 import BeautifulSoup as soup


url = "http://adventurequest.life/"
scraper = cloudscraper.create_scraper()
html = scraper.get(url).text
page_soup = soup(html, "html.parser")
print(page_soup)

Do you guys have any idea how to fix this?

Aeiddius
  • 328
  • 1
  • 3
  • 12
  • 1
    The fix is already in the error message. Read it carefully. – baduker Jan 15 '21 at 09:32
  • yeah, It says not in the open source but, uh. there is no paid version. – Aeiddius Jan 15 '21 at 09:37
  • Having the same issue. Anyone found a fix? – xliang Jan 27 '21 at 08:18
  • 2
    the only way I fixed this is by asking the website owner a user agent token from cloudflare. I was running a scraper on our community website so I got it. Don't know about others. Apparently, there's still no bypass for this. – Aeiddius Jan 27 '21 at 08:59
  • Thanks. Hope the author will fix it soon (i will try raise an issue on github). Is there a paid version? I cannot find any. – xliang Jan 27 '21 at 09:43
  • that's the problem. there's no paid version. lmao. Hopefully the author will react to your notifs. – Aeiddius Jan 27 '21 at 10:10
  • @Aeiddius, was wondering if you have fully resolve this problem? I am facing the same problem :) – shawnngtq Mar 20 '21 at 15:05
  • 1
    @shawnngtq unfortunately, I haven't found a solution. Do you have the latest cloudscraper version? The author apparently made an update the day after my last comment. I haven't tested if the issue is still there. – Aeiddius Mar 20 '21 at 15:54
  • @Aeiddius, I'm using the latest version (1.2.56), same issue ... – shawnngtq Mar 21 '21 at 06:56
  • 1
    Same error in version 1.2.58 – Dan A.S. Aug 27 '21 at 10:57

2 Answers2

1

The cloudscraper library do not provide the bypass for cloudfare version 2 captcha in the free version. So in order to scrape such sites, one of the alternatives is to use a third party captcha solver.

Cloud scraper currently supports the following provider:

You can subscribe to their respective APIs and use the given API key with cloud scraper like the example in their README

scraper = cloudscraper.create_scraper(
  interpreter='nodejs',
  captcha={
    'provider': '2captcha',
    'api_key': 'your_2captcha_api_key'
  }
)

But in case you are still facing issues, you can try to continue with other Anti Bot Bypass providers. For example you can try using third party proxies with requests using

import requests
url = "https://the.url/to/scrape" 
proxy = "http://subscribed.proxy/" 
proxies = {"http": proxy, "https": proxy} 
response = requests.get(url, proxies=proxies, verify=False)
print(response.text)
Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Gidoneli
  • 123
  • 8
-3

I encountered the same error when using scrapy + cloudscraper, but then I seted cookie_enable=true just fine:

Error

Traceback (most recent call last):
cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.
2021-04-27 09:59:30 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.forever21.com/us/shop/catalog/category/f21/lingerie>
Traceback (most recent call last):
StopIteration: <403 
https://www.forever21.com/us/shop/catalog/category/f21/lingerie>

before:

import cloudscraper

browser = cloudscraper.create_scraper()

# in middleware
req = spider.browser.get(url,
                         proxies={'http': proxy,
                                  'https': https_proxy
                                  headers={'referer': url},
                         )

after:

'COOKIES_ENABLED': True

but in bs4 Cookies are added by default, so i I tried your code and found it is normal.

url = "http://adventurequest.life/"
scraper = cloudscraper.create_scraper()
html = scraper.get(url).text
page_soup = soup(html, "html.parser")
print(page_soup)

<!DOCTYPE doctype html>
<html lang="en" style="min-height: 100%;">
<head>
<!-- Required meta tags -->
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/>
<meta content="Auto Quest Worlds" name="twitter:title"/>
<meta content="aqw bots, adventure quest bots, aqw cheat, aqw hack, aqw exploits, grimoire download, adventure quest worlds bot, leveling bot aqw, botting mmorpg, aqw private server, aqworlds private server, aqw server, aqw ps, aqw private, skidson, aqw pirata, servidor de aqw, adventure quest worlds private, dragonfable private server, adventure quest private server, free to play mmorpg, free online games, browser games, jogos online, jogos criancas, jogos de navegador, best aqw private server, best online mmorpg, best browser mmorpg, habbo servidor privado, habbo retro, habbo private server, runescape private server, high rates aqw, aqw items, aqworlds wiki" name="keywords"/>
<meta content="https://adventurequest.life/" name="twitter:url"/>

maybe you should check your machine opennssl version then updated or upgrade cloudscraper version.

my cloudscraper version is: cloudscraper ========> 1.2.58

Fan
  • 9
  • 2