I have been scouring the internet for making my Selenium Web crawler work with proxies. My System is Mac OS High Sierra 10.13.6
I'm currently updated to Selenium 4.10, Python 3.11.4 and have made some code to log into my web page. I was testing both Undetected ChromeDriver and Chromedriver 114.05735.90, with and without proxies to see the difference in automation.
This Option argument is what I've been using for Chromedriver 114
opts.add_argument(f'--proxy-server=https://{proxy_use}')
I'm passing in an array index of proxies, with IP and PORT both within the index. Then I'm passing the variables to a Class that executes the Selenium Code. Here is the code snippet of Chrome 114 First. The reason the chromedriver is called initially in an if1 statement is to setup up the driver options and service when first index variable randomly chosen i2 is at zero, this also includes the initial proxy setup. The index variable ii3 is the proxy array index. The driver from the setup function is passed into a class which is then indexed to the main body of the for4 loop.
for x in range(0, i):
if choice == '1':
if ii == len(arr):
print("Recycle Proxies...")
ii = 0
if Choice == 'Y' and i > 0:
proxy_use = str(arr[ii])
opts.add_argument('--proxy-server=https://{proxy_use}'))
print("\nProxy " + str(arr[ii]))
if i == 0:
url = "www.example.com"
opts,s = drive_setup(opts,Choice,arr,ii)
driver = webdriver.Chrome(service=s, options = opts)
driver.get(url)
Bot(driver, arr_U, arr_P, i, w_file)
time.sleep(5)
driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[1])
driver.get(url)
driver.switch_to.window(driver.window_handles[0])
driver.close()
driver.switch_to.window(driver.window_handles[0])
The error is ERR_TIME_OUT
Here are my driver options:
def drive_setup(opts,Choice,arr,ii):
header = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br'}
#opts.add_argument('--no-sandbox')
opts.add_argument(f"user-agent={header}")
opts.add_argument('--disable-blink-features')
opts.add_argument('--disable-blink-features=AutomationControlled')
opts.add_argument('--window-size=1920,1080')
opts.add_argument('--disable-gpu')
opts.add_argument('--allow-running-insecure-content')
opts.add_argument("--disable-dev-shm-usage")
if Choice == 'Y':
proxy_use = str(arr[ii])
opts.add_argument(f'--proxy-server=https://{proxy_use}')
print("\nProxy " + str(arr[ii]))
#opts.add_argument("--headless")
s = Service('driver_chrome.exe')
return opts, s
Then with Undetected Chrome I noticed differences in the Options and use of proxies. As the Options turned into opts = uc.ChromeOptions()
I looked into troubleshooting on GitHub and to get proxies to work I noticed the import pproxy. I tried using that too, making a local server and using the remote HTTPS the given proxy with self signed certificates, all the while using the --sys and -vv flag to set the system proxy too, which I know Google Chrome uses.
The error in this Undetected Chrome Driver in this snippet is ERR_PROXY_NOT_SUPPORTED, this uses pproxy, which part of the command needs to listening to two ports because the website runs over port 80(HTTP) and 443 (HTTPS).
for x in range(0, i):
proxy_server = '127.0.0.1'
lport1 = 443
lport2 = 80
if choice == '1':
if ii == len(arr):
print("Recycle Proxies...")
ii = 0
if Choice == 'Y' and i > 0:
opts.add_argument(f'--proxy-server=https://{proxy_server}:{lport1}')
print("\nProxy " + str(arr[ii]))
if i == 0:
url = "www.example.com"
opts,s = drive_setup(opts,Choice,arr,ii)
driver = webdriver.Chrome(service=s, options = opts)
pproxy_command = f"pproxy -l https://{proxy_server}:{l_port1} -l http://{proxy_server}:{lport2} --ssl -cert server.cert,server.key -r https://{proxy_use} --sys -vv'
process = subprocess.Popen(pproxy_command.split())
driver.get(url)
Bot(driver, arr_U, arr_P, i, w_file)
time.sleep(5)
driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[1])
driver.get(url)
driver.switch_to.window(driver.window_handles[0])
driver.close()
driver.switch_to.window(driver.window_handles[0])
process.terminate()
My program works without proxies, however I wanted to find the best way to implement this for Bot Detection.
- In Terms of Bot Detection of current version of selenium, what would be the best way to implement proxies?
- In Terms of Proxies, would Undetected ChromeDriver or ChromeDriver 14.xxx.xx(source) be the best way to implement Anti-Bot Detection?
- Are the imports satisfactory to reach the end goal of going through the Class(module)?
I Tested with Multiple Proxies
I tried:
Formatting Proxies, using different proxies, checking system auto-proxy discovery in Mac OS. Using Original Method,got error Timeout.
Using pproxy import to create a local server for both HTTP and HTTPS listeners to the remote formatted proxy array index, with undetected chromedriver.
Creating Parallel Processes with PPROXY when class is called, with Tunnel Connection Errors.