0

Long story short, I'm not a coder. My team used to have this coder who created this Python/Selenium code to extract some information from chrome browser (Echocardiography reports) and/or downloaded mht file (also Echocardiography reports).

This code was working fine until recently, it stopped working. The program still successfully downloads the mht file via chrome. However, it fails to open the file and hence, code continues without extracting any information - resulting in empty extractions.


This is the part I need help figuring out

                driver.get('chrome://downloads')
                # driver.get('file:///C:/Users/name/Downloads/')

                root1 = driver.find_element_by_tag_name('downloads-manager')
                shadow_root1 = expand_shadow_element(root1)

                time.sleep(2)

                root2 = shadow_root1.find_element_by_css_selector('downloads-item')
                shadow_root2 = expand_shadow_element(root2)

                time.sleep(1.5)

                openEchoFileButton = shadow_root2.find_element_by_id('file-link')
                mhtFileName = openEchoFileButton.text

                driver.get('file:///C:/Users/name/Downloads/' + mhtFileName)  # go to web page
                try:
                    echoDateElement = WebDriverWait(driver, delay).until(
                        EC.presence_of_element_located((By.XPATH, '/html/body/div[3]/p[1]/span[3]')))
                except TimeoutException:
                    print("Loading page took too much time!")

I'm trying to figure out why it suddenly fails to open the downloaded mht files. Last time our team tried using this code is back in 2020 and was successful. Were there any updates to Chrome perhaps?

Help would be immensely appreciated. Thank you so much in advance.

1 Answers1

1

There are three obvious weaknesses in this code. The first two are the use of time.sleep() to wait for the element to appear and be manipulable. What if the machine is busy doing something else, and 1.5 seconds isn't enough? The right way to do that is to repeatedly check for the element to be ready. You've got a great example of how to do that using WebDriverWait() in this code already. The third weakness is the locator used in that presence_of_element_located() call. XPath locators rooted at "/html" are notoriously fragile, subject to breakage by small changes to the web page. Try to find something in the page that you can check via a more stable locator - ideally, an element with an ID= attribute.

Ross Patterson
  • 9,527
  • 33
  • 48
  • Thank you so much for your response. Do you know where I can find some examples of changing the time.sleep() to WebDriverWrite() and perhaps, change the locator to an element with ID when I need to open the latest downloaded file from Chrome (mht file in this case)? Sorry, I may sound very confused - I am not familiar with Selenium at all... Would changing these help resolve the issue? – Sunghoon Minn Feb 14 '22 at 07:00
  • When this code was successfully running, I remember a page opening up the downloaded mht file, but currently it does not open any downloaded mht files at all. Does it need a fix before the try: part or afterwards? I am not sure what '/html/body/div[3]/p[1]/span[3]' is and what it is locating to... – Sunghoon Minn Feb 14 '22 at 07:13