2

I am trying to extract some data from a website. However, the site has a hierarchical structure. It has a dropdown menu on the top, whose option valueare URLs. Thus, my approaches are:

  1. find the dropdown box,
  2. select an option,
  3. extract some data, and
  4. repeat Steps 2 to 4 for all available options.

Below is my code, I am able to extract data under the default selected option (the first one). But I got error Message: Element not found in the cache - perhaps the page has changed since it was looked up. It seems like my browser was not switched to the new page. I tried time.sleep() or driver.refresh(), but failed... Any suggestions are appreciated!

###html
<select class="form-control"> 
    <option value="/en/url1">001 Key</option>
    <option value="/en/url2">002 Key</option>
</select>

### python code

# select the dropdown menu
select_box = Select(driver.find_element_by_xpath("//select[@class='form-control']"))
# get all options
options = select_box.options

for ele_index, element in enumerate(options):
    # select a url
    select_box.select_by_index(ele_index)
    time.sleep(5)
    print element.text

    # extract page data
    id_comp_html = driver.find_elements_by_class_name('HorDL')
    for q in id_comp_html:
        print q.get_attribute("innerHTML")
        print "============="

update 1 (based on alecxe's solution)

# dropdown menu
select_box = Select(driver.find_element_by_xpath("//select[@class='form-control']"))
options = select_box.options

for ele_index in range(len(options)):
    # select a url
    select_box = Select(driver.find_element_by_xpath("//select[@class='form-control']"))
    print select_box.options[ele_index].text
    select_box.select_by_index(ele_index)
    # print element.text
    # print "======"
    driver.implicitly_wait(5)
    id_comp_html = driver.find_elements_by_class_name('HorDL')
    for q in id_comp_html:
        print q.get_attribute("innerHTML")
        print "============="
TTT
  • 4,354
  • 13
  • 73
  • 123

2 Answers2

1

Your select_box and element references got stale, you have to "re-find" the select element while operating the option indexes inside the loop:

# select the dropdown menu
select_box = Select(driver.find_element_by_xpath("//select[@class='form-control']"))
# get all options
options = select_box.options

for ele_index in range(len(options)):
    # select a url
    select_box = Select(driver.find_element_by_xpath("//select[@class='form-control']"))
    select_box.select_by_index(ele_index)

    # ...
    element = select_box.options[ele_index]

You might also need to navigate back after selecting an option and extracting the desired data. This can be done via driver.back().

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Thanks for your comment! I added `element = select_box.options[ele_index]` to the end of this loop, but still got the same error... – TTT May 03 '17 at 07:01
  • @tao.hong wait, but have you copied the parts of the presented code including the redefining the `select_box`? – alecxe May 03 '17 at 09:38
  • correct, see my update 1 in the post. Maybe the `driver` is still not refreshed? – TTT May 03 '17 at 14:24
  • @tao.hong right. Let's try the following. What if you add, just for testing purposes, `time.sleep(5)` at the beginning of the loop, firs line? (don't forget to import time) – alecxe May 03 '17 at 15:04
  • Thx for staying with me. I moved `time.sleep(5)` to the beginning of the code, but still have the same error... – TTT May 03 '17 at 15:40
  • @tao.hong interesting, on what line does it fail? – alecxe May 03 '17 at 15:41
  • looks like about `element`: `Traceback (most recent call last): print element.get_attribute("value")` – TTT May 03 '17 at 15:42
  • @tao.hong ah, judging by your updated code in the question - you did not change the loop as in my answer - note how we get the `element` inside the loop. – alecxe May 03 '17 at 15:44
  • got a working version. I think `element = select_box.options[ele_index]` should be before `select_box.select_by_index(ele_index)`. Thanks for your help! – TTT May 03 '17 at 16:04
1

From your description of the site and your code, it looks like selecting an option from the dropdown sends you to another page; so after your first iteration in the for loop, you have moved to a different page, while your options variable is pointing to elements that are in the previous page.

One solution, specific to your situation (and likely the best in this case), would be to store the option values (i.e. the urls) instead, and directly navigate to those urls instead via the .get() method.

Otherwise, you would either need to keep a counter and get the contents of the dropdown with every iteration, or navigate backwards after every iteration, both choices being unnecessary in this case.

Ken Wei
  • 3,020
  • 1
  • 10
  • 30