I’m having trouble saving files from a website using Python and the selenium library. My python script creates a list of URLs from a page then saves the files referenced by those URLs to a local folder. At least that’s my intent. In my testing the first of two files appears promptly in my downloads folder, not the folder I specified. Then something hangs. I eventually get these warnings.
(Pdb) continue
download_files(our_urls, driver, SCRATCH_FOLDER)
File "C:\Users\brussell\ScanAndReplace\web_driver.py", line 60, in download_files
driver.get(thisURL)
File "C:\Users\brussell\ScanAndReplace\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 449, in get
self.execute(Command.GET, {"url": url})
File "C:\Users\brussell\ScanAndReplace\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 440, in execute
self.error_handler.check_response(response)
File "C:\Users\brussell\ScanAndReplace\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 245, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: Navigation timed out after 300000 ms
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8
WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:187:5
TimeoutError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:674:5
bail@chrome://remote/content/marionette/sync.sys.mjs:213:19
The file for the second URL in my list never appears either in my downloads folder or the folder I specified. Here is my code for preparing the browser object for action:
options = webdriver.FirefoxOptions()
options.set_preference("browser.download.folderList", '2')
options.set_preference("browser.download.manager.showWhenStarting", False)
options.set_preference("browser.download.dir", SCRATCH_FOLDER)
options.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/octet-stream")
driver = webdriver.Firefox(options=options)
This is the function that writes the files to disk:
def download_files(urls, driver, someFolder):
breakpoint()
n_files_to_process = len(urls)
for thisURL in urls:
driver.get(thisURL)
while len(os.listdir(someFolder)) < n_files_to_process:
wait(5)
I don’t know why the program seems to hang after writing the first file to disk. I did set a breakpoint but I was not able to isolate the problem. It seems odd the program would hang after writing the first file. In this test there should have been two files to download. About the line:
options.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/octet-stream") Is this absolutely necessary? Should I be adding a similar line for every file type I’m likely to encounter? In this program I am collecting data from online courses hosted in the Canvas LMS. A small part of this involves parts of the course that are not exposed by the Canvas API. I don’t seem to be able to use the requests library while accessing Canvas. For details on this wrinkle see my previous posting.
.