0

I want to submit the form on this site https://nhqrnet.ahrq.gov/inhqrdr/data/submit for all possible combinations and download all excel files. My code runs successfully for one iteration but once it gets to the 2nd iteration it returns the error mentioned below.

My code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions
import time


browser = webdriver.Chrome(executable_path="C:/Users/IN027/chromedriver.exe")
browser.get('https://nhqrnet.ahrq.gov/inhqrdr/data/submit')


dropdown = Select(browser.find_element_by_name("stateName"))

list=[]
for element in browser.find_element_by_id('stateName').find_elements_by_tag_name('option'):
    list.append(element.get_attribute('value'))
    list=[z for z in list if z!='']
    for i in list:
        dropdown.select_by_value(i)
        # time.sleep(3)
        dropdown1 = Select(browser.find_element_by_name("subjectAreaId"))
        list3=[]
        for element1 in browser.find_element_by_id('subjectAreas').find_elements_by_tag_name('option'):
            list3.append(element1.get_attribute('value'))
            list3= [m for m in list3 if m!='']
            for z in list3:
                dropdown1.select_by_value(z)

                time.sleep(2)
                dropdown2 = Select(browser.find_element_by_name("topicId"))
                a = []

                for element2 in browser.find_element_by_id('topics').find_elements_by_tag_name('option'):
                    a.append(element2.get_attribute('value'))
                    a = [x for x in a if x!='']
                    for d in a:
                        time.sleep(2)
                        dropdown2.select_by_value(d)
                        dropdown3= Select(browser.find_element_by_name("subMeasureId"))
                        list2=[]
                        print('check1')
                        for element3 in browser.find_element_by_id('measures').find_elements_by_tag_name('option'):
                            #time.sleep(2)
                            print(element3.text)
                            # print(element3.get_attribute('value'))
                            time.sleep(2)
                            list2.append(element3.get_attribute('value'))
                            list2= [y for y in list2 if y!='']
                            print(len(list2))
                            for b in list2:
                                print('check2')
                                time.sleep(2)
                                dropdown3.select_by_value(b)
                                #browser.implicitly_wait(10)
                                #browser.find_element_by_xpath("//*[@id='filterByCategory']").click
                                time.sleep(0.5)
                                browser.find_elements_by_css_selector("input[type='radio'][value='byTotal']")[0].click()
                                time.sleep(0.5)
                                #browser.implicitly_wait(10)
                                #browser.find_element_by_xpath("//*[@id='query']/form/input[2]").click
                                browser.find_elements_by_css_selector("input[type='submit'][value='Get Data']")[0].click()

                                time.sleep(4)
                                browser.find_element_by_id('table').find_elements_by_tag_name('a')[0].click()

                                time.sleep(2)

                                browser.find_element_by_id('formTab').find_elements_by_tag_name('a')[0].click()

The error message is given below :

    Traceback (most recent call last):
  File "C:/Users/IN027/web_scraping/sel_scraper.py", line 60, in <module>
    list2.append(element3.get_attribute('value'))
  File "C:\Python36\lib\site-packages\selenium\webdriver\remote\webelement.py", line 143, in get_attribute
    resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name})
  File "C:\Python36\lib\site-packages\selenium\webdriver\remote\webelement.py", line 633, in _execute
    return self._parent.execute(command, params)
  File "C:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=72.0.3626.109)
  (Driver info: chromedriver=71.0.3578.137 (86ee722808adfe9e3c92e6e8ea746ade08423c7e),platform=Windows NT 10.0.16299 x86_64)
  • Wow, what a huge code for such a simple task. You don't even need selenium, API returns JSON, so just make nested loops to get the IDs and construct the form request. – Boy Feb 21 '19 at 10:26
  • 1
    @Boy I would warn the user that combinations may generate a very large number of requests so worth checking if the API (official API?) has a request limit. – QHarr Feb 21 '19 at 10:29
  • @QHarr, It's not that much requests... I'm not encouraging web scraping, just saying... – Boy Feb 21 '19 at 10:34
  • @Boy I'd have to disagree if they are doing all combinations. Bear in mind whilst top level has only 53 state options, you then have 8 subject areas which determine varying length topic options (upto 15), then measures is an enormous list. This would resemble a DDoS if I ran something to do all possible permutations with xhr (Not sure what happens with API but I cannot imagine this is within a daily call limit). – QHarr Feb 21 '19 at 10:45
  • @QHarr Firstly, it cannot be DDoS because I guess it would be run from a single server, from a single thread, so it's DOS. Secondly, all those requests to get IDs are very small and in total few MB, what kind of a server cannot handle a single synchronous user? I guess the bottleneck would be .xls, even tho I think this was meant for a one time use... or maybe I'm wrong... – Boy Feb 21 '19 at 10:54
  • I'll bow to your greater knowledge but seems like a large number to me for all combinations. I guess OP will find out. – QHarr Feb 21 '19 at 10:59
  • @QHarr Well, seems like it's not what they think, I think they don't care, otherwise they wouldn't make js that repeatedly requests the server for the same data, without caching it. They could have just included all IDs in a single request and reduce poking, but still there is .xls, it's the biggest problem. Anyway, you can't expect of people to not download public data, there will always be requests like that... If they do care about server resources, they could just make an official API. – Boy Feb 21 '19 at 11:18
  • Good points all – QHarr Feb 21 '19 at 11:25
  • @QHarr Was that sarcastic? XD – Boy Feb 21 '19 at 13:47
  • @Boy No. It was genuine. I am here to learn so appreciate when someone takes the time to point things out. – QHarr Feb 21 '19 at 13:48
  • have you tried putting in 'while True:' , you can run through all iterations if you provide correct conditions, 'while True:' will loop on itself as long as some condition is till True – Julian Silvestri Feb 21 '19 at 14:02

0 Answers0