0

I am trying to scrape certain information from a webpage [https://reality.idnes.cz/rk/detail/m-m-reality-holding-a-s/5a85b582a26e3a321d4f2700/]. I have a list of links from which I need to go through all of them. Each links contains the same information about a different company. However few of the companies didn't add phone number for example. However if that happens the whole program is terminated with an exception error. This is my code :

for link in link_list:
            try:
                driver.get(', '.join(link))
                time.sleep(2)
                information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "h1.b-annot__title.mb-5")))
                title = driver.find_element_by_css_selector("h1.b-annot__title.mb-5")
                information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "span.btn__text")))
                offers = driver.find_element_by_css_selector("span.btn__text")
                information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "p.font-sm")))
                addresses = driver.find_element_by_css_selector("p.font-sm")
                information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.item-icon.measuring-data-layer")))
                phone_number = driver.find_element_by_css_selector("a.item-icon.measuring-data-layer")
                information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.item-icon")))
                email = driver.find_element_by_css_selector("a.item-icon")
                print(title.text, " ", offers.text, " ", addresses.text, " ", phone_number.text, " ", email.text)
                writer.writerow([title.text, offers.text, addresses.text, phone_number.text, email.text])
            except Exception:
                print(title.text, " ", offers.text, " ", addresses.text, " ", phone_number.text, " ", email.text)
                writer.writerow([title.text, offers.text, addresses.text, phone_number.text, email.text])
                continue

As you can see I am trying to prevent the whole program from terminating. I have tried adding the Exception > continue so the program wouldn't get terminated but I noticed that even though the program doesn't terminate none of the information is then scraped from the webpages that the exception occurred on. I am trying to prevent the loss of the data by requesting the desired output once again when the exception occurs so it would get printed out and saved into a CSV file with the missing information as an empty slot.

However the whole problem is that when I request the output in the exception the exception once again terminates the whole program instead of printing out what it knows and moving on with "continue". Now my question is why is that happening? Why doesn't the program print out the output and then won't follow the "continue" and terminates itself instead? How can one print the output that the program got without the missing information and prevent the program from terminating?

541daw35d
  • 141
  • 2
  • 12
  • Are you getting any traceback at the end or the program just stops? Also, if you add another print statement below the one in the except block, which just prints a string, does it work? – Rolv Apneseth Nov 19 '20 at 00:32

1 Answers1

0

when it hits the exception the continue will allow subsequent lines to execute, it will NOT pick up where it left off. See This answer: https://stackoverflow.com/a/19523054/1387701

No, you cannot do that. That's just the way Python has its syntax. Once you exit a try-block because of an exception, there is no way back in.

So I think this might help the problem you're finding, as an example, when phone number is missing/not found:

    for link in link_list:
       driver.get(', '.join(link))
       time.sleep(2)
       information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "h1.b-annot__title.mb-5")))
       title = driver.find_element_by_css_selector("h1.b-annot__title.mb-5")

       information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "span.btn__text")))
       offers = driver.find_element_by_css_selector("span.btn__text")

       information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "p.font-sm")))
       addresses = driver.find_element_by_css_selector("p.font-sm")

       try:
           information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.item-icon.measuring-data-layer")))
           phone_number = driver.find_element_by_css_selector("a.item-icon.measuring-data-layer")
       except Exception as e:
           print ("Phone number exception: %s" % str(e) )
           continue

       information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.item-icon")))
       email = driver.find_element_by_css_selector("a.item-icon")

       print(title.text, " ", offers.text, " ", addresses.text, " ", phone_number.text, " ", email.text)
       writer.writerow([title.text, offers.text, addresses.text, phone_number.text, email.text])

You could looks for the specifc exception (TimeoutException from the wait, or NoSuchElementException on the find element command) to be a bit more specific for each element.

Or you could use an if find_elements_by_css_Selector size is > 0 before proceeding, but i'd prefer the try catch myself.

DMart
  • 2,401
  • 1
  • 14
  • 19