-1

recently a scraper I made stopped working in headless mode. I've tried with both firefox and Chrome. Notable things are that I am using seleniumwire to access API requests, and that I am using ChromeDriverManager to get the driver. Current version for Chrome/93.0.4577.63.

I've tried modifying the User-Agent manually as can be seen in the below code, in case the website added some checks blocking HeadlessChrome/93.0.4577.63 which is the original User-Agent. This did not help.

When running the script in regular mode, it works. When running in headless mode, the below code would output [] meaning that driver.get(url) does not return any requests. I run this code daily and it stopped working on 8.9.2021 I think, during the day.

from selenium.webdriver.chrome.options import Options as chromeOptions
from seleniumwire import webdriver
from webdriver_manager.chrome import ChromeDriverManager

options = {
'suppress_connection_errors': False,
'connection_timeout': None
}

chrome_options = chromeOptions()
chrome_options.add_argument("--start-maximized")
chrome_options.add_argument("--incognito")
chrome_options.add_argument('--log-level=2')
chrome_options.add_argument("--window-size=1920,1080")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument('--allow-running-insecure-content')
chrome_options.add_argument('--headless')

driver = webdriver.Chrome(ChromeDriverManager().install(), seleniumwire_options=options, chrome_options=chrome_options)

userAgent = driver.execute_script("return navigator.userAgent;")
userAgent = userAgent.replace('Headless', '')
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": userAgent})

url = 'my URL goes here'
driver.get(url)
print(driver.requests)

Same issue with FireFox, headless does not work but regular browsing does. Any idea what might cause this problem and what could solve it? I've also tried adding the following arguments to Chrome options without any luck:

chrome_options.add_argument("--proxy-server='direct://'")
chrome_options.add_argument("--proxy-bypass-list=*")
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--ignore-certificate-errors-spki-list')
chrome_options.add_argument('--ignore-ssl-errors')
qoob
  • 137
  • 9

1 Answers1

1

This may have been solved - I noticed that I first set the window size to maximize and after that set it to 1920,1080. When I removed the argument to maximize chrome_options.add_argument("--start-maximized") the problem disappeared and now the script works once again.

I'm not sure if this actually solved it or whether it was something else, since Selenium is a bit finicky and sometimes data just won't load the same way for the same web page, but at least now it works.

qoob
  • 137
  • 9