0

To be able to capture headers (the Selenium library does not support this) I decided to use the Selenium Wire library. I found the following website: https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/snippets/60 that explains how to use the Selenium Wire library with the Tor browser. However, when I use the code from this page I get a connection error, quote "Error connecting to SOCKS5 proxy 127.0.0.1:9150: [WinError 10061]". I also can't set header capture according to the documentation of the Selenium Wire library: https://github.com/wkeeling/selenium-wire . The documentation states that this should be according to the formula:

def interceptor(request):
    del request.headers['Referer']  # Remember to delete the header first
    request.headers['Referer'] = 'some_referer'  # Spoof the referer

driver.request_interceptor = interceptor
driver.get(...)

# All requests will now use 'some_referer' for the referer

However, it does not explain what a request is or why a function reference is not interceptor().

Olgierd Wiśniewski
  • 433
  • 2
  • 8
  • 14

1 Answers1

0

As for the proxy settings from the example, for this to work, you must first open the Tor browser. In the following code, this is done by a script. This is because in order to set up a proxy, it must first work. When it comes to capturing headers, you should follow the Selenium Wire documentation exactly. Below is a working script that allows you to capture headers:

import os
import time

from seleniumwire import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

def firefoxdriver(my_url):
    """Preparing of the Tor browser for the work."""
    # The location of the Tor Browser bundle
    #   for my laptop.
    # tbb_dir = r'C:\Users\Oliver\Desktop\Tor Browser'
    #   for my mainframe.
    tbb_dir = r'C:\Users\olive\OneDrive\Pulpit\Tor Browser'

    # Set the Tor Browser binary and profile.
    tb_binary = tbb_dir + r'\Browser\firefox.exe'
    tb_profile = tbb_dir + r'\Browser\TorBrowser\Data\Browser\profile.default'
    binary = FirefoxBinary(tb_binary)
    profile = FirefoxProfile(tb_profile)

    # Open Tor Browser to allow to work on the proxy.
    torexe = os.popen(tb_binary)

    # Disable Tor Launcher to prevent it connecting the Tor Browser to 
    #   Tor directly.
    os.environ['TOR_SKIP_LAUNCH'] = '1'
    os.environ['TOR_TRANSPROXY'] = '1'

    # Disable HTTP Strict Transport Security (HSTS) in order to have 
    #   seleniumwire between the browser and Tor.
    profile.set_preference("security.cert_pinning.enforcement_level", 0)
    profile.set_preference("network.stricttransportsecurity.preloadlist", False)

    # Tell Tor Button it is OK to use seleniumwire
    profile.set_preference("extensions.torbutton.local_tor_check", False)
    profile.set_preference("extensions.torbutton.use_nontor_proxy", True)

    # Enable JavaScript at all, otherwise JS stays disabled regardless 
    #   of the Tor Browser's security slider value.
    profile.set_preference("browser.startup.homepage_override.mstone", "68.8.0")

    # Configure seleniumwire to upstream traffic to Tor running on 
    #   port 9150.
    # It is possible to increase/decrease the timeout if you are trying
    #   to a load page that requires a lot of requests. It is in 
    #   seconds.
    options = {
        'proxy': {
            'http': 'socks5h://127.0.0.1:9150',
            'https': 'socks5h://127.0.0.1:9150',
            'connection_timeout': 20
        }
    }

    driver = webdriver.Firefox(firefox_profile=profile,
                                firefox_binary=binary,
                                seleniumwire_options=options)

    return driver

def interceptor(request):
    """
    Adding the headers to the browser - create a request interceptor.
    """
    del request.headers['User-Agent']
    request.headers['User-Agent'] = ('Mozilla/5.0 (Windows NT 10.0;rv:102.0)'+
        ' Gecko/20100101 Firefox/102.0')
    del request.headers['Accept']
    request.headers['Accept'] = ('text/html,application/xhtml+xml,application'+
        '/xml;q=0.9,image/avif,image/webp,*/*;q=0.8')
    del request.headers['Accept-Language']
    request.headers['Accept-Language'] = 'en-US,en;q=0.5'

# Variable with the URL of the website.
my_url = 'https://httpbin.org/headers'

# Preparing of the Tor browser for the work.
driver = firefoxdriver(my_url)

# Adding the headers to the browser - set the interceptor on the 
#   driver.
driver.request_interceptor = interceptor

# Loads the website code as the Selenium object.
driver.get(my_url)

# Access requests via the `requests` attribute.
for request in driver.requests:
    if request.response:
        print(
            request.url,
            request.response.status_code,
            request.response.headers['Content-Type'],
            request.headers
        )

time.sleep(15)
driver.quit()
Olgierd Wiśniewski
  • 433
  • 2
  • 8
  • 14