Why doesn't this code successfully bypass a captcha?

Question

This my code. I using twocaptcha, rerequests, bs4, fake_user_agent. The code should be registered on the site using the requests.post method, but something is going wrong. Also, the code does not output errors and the result is 200. But in fact, the code does not fulfill its duties.

import time
from twocaptcha import TwoCaptcha
import requests
from bs4 import BeautifulSoup
from fake_user_agent.main import user_agent

# Капча
config = {
            'server':           'rucaptcha.com',
            'apiKey':           'API',
            'defaultTimeout':    120,
            'recaptchaTimeout':  600,
            'pollingInterval':   10,
        }
solver = TwoCaptcha(**config)
print(solver.balance())

result = solver.recaptcha(sitekey='6LdTYk0UAAAAAGgiIwCu8pB3LveQ1TcLUPXBpjDh',
  url='https://funpay.com/account/login',
  param1=...)
result = result["code"]
print(result)

# Подменяем UserAgent
site = "https://funpay.com/account/login"
user = user_agent()
header = {
    "user-agent": user
}

# Ищем csrf-token
r = requests.get(site)
soup = BeautifulSoup(r.text, "lxml")
csrf = soup.find("body").get("data-app-data").split('"')[3]
print(csrf)

# Ключи
data = {
    "login": login,
    "password": pasword,
    "csrf_token": csrf,
    "g-recaptcha-response": result
}
print(data)
link = "https://funpay.com/chat/"
session = requests.Session()
session.headers = header
session.get(site)
responce = session.post(url=link, data=data, headers=header)
print(responce.text)

# Парсинг
link = "https://funpay.com/chat/"
k = session.get(link, headers=header).text

What exactly do you man by "the code does not fulfill its duties"? What is the code expected to do? What does it do instead? — ForceBru, Jul 11 '22 at 19:05
The code does not enter the site. In this case, the captcha is solved correctly — Влад Даниленко, Jul 11 '22 at 19:24

score 0 · Answer 1 · answered Jul 11 '22 at 21:08

It's difficult to give you an answer that is 100% since how the site handles everything from timing to csrf generation to the fact that you can't really test to see if the recaptcha response is correct independently all might be a factor. That said, I suspect that if the site isn't so slow as to take 30 seconds to a minute after the recaptcha is solved for you to get to submit the form, the problem may be that you are calling stateless instances of requests multiple times before initializing the Session object. This can easily create the situation where the session the server associates you with by the end is not the one connected to the PHPSESSIONID that you started with.

The safer workflow would be something like:

import requests
import bs4
from twocaptcha import TwoCaptcha


session = requests.Session() 
headers = {"user-agent": "blahblahblah agent"}
session.headers.update(headers)

Since your user agent isn't going to change from here on out, you can put it in here. Any additional header parts that may change will get added on later either automatically or by hand, but this way you can stay on the same page literally with the server while you solve the recaptcha.

r = session.get("https://funpay.com/account/login")

soup = bs4.BeautifulSoup(r.content)
csrf = soup.find(name="csrf_token")["value"]

Even though the code on the page gets the csrf token from elsewhere, grabbing it from the form that'll be submitted is cleaner and less prone to mistakes when you split a string and it's urlencoded and all that, but up to you. I would initialize the solver object here though. Your session is still open, and there should be no more movement by the session object until your post request at login time. This should be the only time you call the login page this session.

solver = TwoCaptcha("apikey")
response = solver.recaptcha(sitekey="6LdTYk0UAAAAAGgiIwCu8pB3LveQ1TcLUPXBpjDh", url="https://funpay.com/account/login", json=1)

BTW your apikey should work for either 2captcha and rucaptcha backends, although annoyingly if you have accounts on both there's no easy way to know without using it which one is denominated in which. Should work all the same. Anyway, once the captcha response comes in, you can build your payload and submit.

data = {"csrf_token": csrf, "login": yourlogin, "password": yourpassword, "g-recaptcha-response": response["code"]}
soutput = session.post("https://funpay.com/account/login", data=data)
print(soutput.text)

I suspect that your problem, if you can reasonably be sure that the recaptcha is being solved correctly, is that you don't have a persistent session until it's too late and the server is treating your your requests as new ones and assigning you new cookies.

Why doesn't this code successfully bypass a captcha?

1 Answers1