15

How does Cloudflare even know that this request came from a script even if I provided all the data, cookies and parameters when making a normal request? What does it check for? Am I doing something wrong? For example (I have redacted some of the values):

import requests

cookies = {
    '__Host-next-auth.csrf-token': '...',
    'cf_clearance': '...',
    'oai-asdf-ugss': '...',
    'oai-asdf-gsspc': '...',
    'intercom-id-dgkjq2bp': '...',
    'intercom-session-dgkjq2bp': '',
    'intercom-device-id-dgkjq2bp': '...',
    '_cfuvid': '...',
    '__Secure-next-auth.callback-url': '...',
    'cf_clearance': '...',
    '__cf_bm': '...',
    '__Secure-next-auth.session-token': '...',
}

headers = {
    'authority': 'chat.openai.com',
    'accept': 'text/event-stream',
    'accept-language': 'en-IN,en-US;q=0.9,en;q=0.8',
    'authorization': 'Bearer ...',
    'content-type': 'application/json',
    'cookie': '__Host-next-auth.csrf-token=...',
    'origin': 'https://chat.openai.com',
    'referer': 'https://chat.openai.com/chat',
    'sec-ch-ua': '"Brave";v="111", "Not(A:Brand";v="8", "Chromium";v="111"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Linux"',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'sec-gpc': '1',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
}

json_data = {
 ...
}

response = requests.post('https://chat.openai.com/backend-api/conversation', cookies=cookies, headers=headers, json=json_data)

I have tried different useragents to no avail, but I can't seem to figure out whats causing the problem in the first place.

The response comes back with error code 403 and HTML something like:

<html>
...
...
<h1>Access denied</h1>
  <p>You do not have access to chat.openai.com.</p><p>The site owner may have set restrictions that prevent you from accessing the site.</p>
  <ul class="cferror_details">
    <li>Ray ID: ...</li>
    <li>Timestamp: ...</li>
    <li>Your IP address: ...</li>
    <li class="XXX_no_wrap_overflow_hidden">Requested URL: chat.openai.com/backend-api/conversation </li>
    <li>Error reference number: ...</li>
    <li>Server ID: ...</li>
    <li>User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36</li>
  </ul>
...
...
</html>
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Anm
  • 447
  • 4
  • 15
  • 4
    I'm not sure this is an issue with being detected as a bot - I encounter the same error when I try to access http://chat.openai.com/backend-api/conversation in Safari and Firefox. –  Mar 30 '23 at 17:21
  • could be cloudflare blocking openai rather than thinking you're a bot, have you tried accessing other URLs? – WhatsThePoint Mar 31 '23 at 09:17
  • 2
    if you're just copying an existing request it could be the cookies change with every request or something – KTibow Apr 01 '23 at 22:55
  • Doesn't `chat.openai.com` require some type of authentication, such as a login or an API token? So how are you authenticating to the `openai.com` service prior to sending a post request? – Life is complex Apr 03 '23 at 12:50
  • Additionally, why aren't you using the [OpenAI Python Library](https://github.com/openai/openai-python) to interact with `chat.openai.com`? – Life is complex Apr 03 '23 at 13:02
  • @Lifeiscomplex That library does not allow you to interact with chatgpt, rather GPT3 and other variants. Plus, I don't want to spend anything. I have copied the network request that I made on the chat.openai.com website, and converted it to python requests. This request is enough for authentication, there is not supposed to be prior authentication. – Anm Apr 03 '23 at 15:00
  • @KTibow No, I checked, only the json_data changes – Anm Apr 03 '23 at 15:01
  • @Lifeiscomplex Go to chat.openai.com, bring up dev tools and go to the network tab, make a query and copy the request made as curl, then go to curlconverter.com to convert to python requests – Anm Apr 03 '23 at 15:28
  • @Lifeiscomplex The cookies are enough for auth – Anm Apr 03 '23 at 16:09

3 Answers3

3

I used to run a web data scraping/mining team that had to scrape about 20K sites every day for publicly available data. The only way we could reliably get past some of the harder bot checks (reCAPTCHA, Cloudflare, and some of the dozen or more AI/ML powered others) was to either use a proxy that made our traffic look like human user traffic, or to programmatically remote control a browser, and sometimes both.

Proxy providers seem to come and go every few years, and the one I used last is no longer around, but it looks like there are a few that guarantee a similar experience of "priming" your requests to make them look like legit traffic. This was necessary for some of the bot detection (reCAPTCHA specifically, but probably also Cloudflare) that use current traffic analysis and historical data to determine if you are a bot or not. None of these proxies would be free, but as long as you don’t need to make 300K requests per day they should be relatively cheap.

The remote control option was a container image that had the browser running on it and a Python-based remote control package* that would interact with the browser like a keyboard and mouse. This was important to defeat/avoid bot detection that would fingerprint the browser and/or observe behavior. There are a rather startling number of properties your browser gives up about itself via JavaScript and you are bound to forget one of them if you aren’t just using a regular browser. Those properties get inspected for any "Bot" flags in conjunction with how fast and what you are clicking on when visiting the page to determine if you are human or not.

* It was PyAutoGui, PyScreeze to take screenshots and pyTesseract to OCR the screenshot. Selenium/WebDriver is detectable by the more advanced bot detection software, hence the screenshot + OCR is used for collecting data and for locating clickables.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
StingyJack
  • 19,041
  • 10
  • 63
  • 122
  • Was the remote control package [PyAutoGUI](https://github.com/asweigart/pyautogui)? –  Apr 01 '23 at 17:26
  • 1
    @MrDeveloper - yes, edited the answer to include that and the other parts – StingyJack Apr 01 '23 at 20:14
  • @StingyJack - Hypothetically, how could someone bypass a `Cloudflare` firewall rule that blocks external access when sending `requests.post` to an endpoint even when you have valid header and cookies information that works 100% of the time when sending `requests.get` to the same endpoint? – Life is complex Apr 09 '23 at 17:12
  • @lifeiscomplex - you don't bypass someone else's firewall in a non hypothetical way that doesn't also constitute a criminal act (usually, and I wouldn't know how to do that anyway as I'm not a security professional.) – StingyJack Apr 10 '23 at 03:39
1

Cloudflare uses advanced technologies. It isn't just the request and header parameters it looks at. It looks at stuff like your browser agent, mouse position and even to your cookies.

For automating this kind of stuff I would use the browser Selenium and PyAutoGUI for mouse movements.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
1

UPDATED 04-10-2023

Back in December 2022, OpenAI deployed Cloudflare to protect ChatGPT from being abused from non-official means. OpenAI also started redesigning their service, which included changing endpoints that were being queried externally from Python scripts.

For example this endpoint, which you are trying to reach use to accept POST requests prior to these changes.

https://chat.openai.com/backend-api/conversations

This endpoint currently only accepts GET requests using the cf_clearance cookie and other header information extracted from an authorized Browser session.

When I tried to use these cookies with a POST request, I get the error message Access was denied Error code 1020 in the response text. This error message is a clear indication that OpenAI has enabled a Cloudflare firewall rule for POST requests to the endpoint in question, which is https://chat.openai.com/backend-api/conversations

Here is a Cloudflare reference for this error message.

The new endpoint is https://api.openai.com/v1/completions, which will except POST requests.

Currently you have at least 3 options to use ChatGPT with Python.

The first is to use selenium, which would allow you to interact with ChatGPT much like you using a browser session yourself.

The second option would be to use the new endpoint as show in the code below:

import requests

url = 'https://api.openai.com/v1/completions'
headers = {'Content-Type': 'application/json',
           'Authorization': 'Bearer YOUR_API_KEY'}
data = {'prompt': 'tell me about wine',
        'model': 'text-davinci-003',
        'temperature': 0.5,
        'max_tokens': 4000}

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
    # ChatGPT will provide a different response with each request.
    print(response.json())
else:
    print(f'Request failed with status code {response.status_code}')

The third option is to use the official ChatGPT API

Here is where you obtain an API key.

Here is the API Documentation

Here is the basic code needed to use the API:

import os
import requests

api_endpoint = "https://api.openai.com/v1/completions"
api_key = os.getenv("OPENAI_API_KEY")

request_headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

request_data = {
    "model": "text-davinci-003",
    "prompt": "What is the most popular programming language?",
    "max_tokens": 100,
    "temperature": 0.5
}

response = requests.post(api_endpoint, headers=request_headers, json=request_data)

# ChatGPT will provide a different response with each request.
if response.status_code == 200:
    response_text = response.json()["choices"][0]["text"]
    print(response_text)
    # The most popular programming language is currently JavaScript, followed by Python, Java, C/C++, and C#
else:
    print(f"Request failed with status code: {str(response.status_code)}")

Hopefully this information is useful to you. Happy coding.

Life is complex
  • 15,374
  • 5
  • 29
  • 58
  • 1
    Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/253060/discussion-between-anm-and-life-is-complex). – Anm Apr 09 '23 at 04:57