2

I'm having an issue crawling pages on Amazon.

I've tried using:

  • Executing JS Script
  • Action Chains
  • Explicit Waits

Nothing seems to work. Everything throws one exception or error or another.

Base Script

ff = create_webdriver_instance()
ff.get('https://www.amazon.ca/gp/goldbox/ref=gbps_ftr_s-3_4bc8_dct_10-?gb_f_c2xvdC0z=sortOrder:BY_SCORE,discountRanges:10-25%252C25-50%252C50-70%252C70-&pf_rd_p=f5836aee-0969-4c39-9720-4f0cacf64bc8&pf_rd_s=slot-3&pf_rd_t=701&pf_rd_i=gb_main&pf_rd_m=A3DWYIK6Y9EEQB&pf_rd_r=CQ7KBNXT36G95190QJB1&ie=UTF8')
next_button = ff.find_element_by_xpath('(//li/a[contains(text(), "Next")])[1]')

Attempt #1: Executing JS

Script

ff.execute_script('arguments[0].scrollIntoView()', next_button)

Error

Element could not be scrolled into view

Attempt #2: Action Chain

Script

actions = ActionChains(ff)
actions.move_to_element(next_button)
actions.click(next_button)
actions.perform()

Error

TypeError: rect is undefined

Attempt #3: Explicit Wait

next_button = WebDriverWait(ff, 60).until(
    EC.visibility_of_element_located((By.XPATH, '(//li/a[contains(text(), "Next")])[1]'))
)

I've also tried using element_to_be_clickable. Both of these end up timing out.

oldboy
  • 5,729
  • 6
  • 38
  • 86

1 Answers1

1

That's because you're trying to handle hidden link. Try below instead

next_button = ff.find_element_by_partial_link_text('Next')
next_button.click()

or

next _button = ff.find_element_by_link_text('Next→')

Note that find_element_by_partial_link_text/find_element_by_link_text searching for visible links only.

Also you might need to call

ff.implicitly_wait(10)

once in your script (somewhere after your WebDriver instance definition) or use ExplicitWait as below

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

next_button = WebDriverWait(ff, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, 'Next→')))

to be sure that required element will be find even with rendering delay

Andersson
  • 51,635
  • 17
  • 77
  • 129
  • But the button is visible in DOM and not hidden?!?! so why didnt my methods work? – oldboy Sep 17 '18 at 23:14
  • 1
    No. The button found by `//li/a[contains(text(), "Next")])[1]` is located inside `div` with class name `hidden`. Note that index `[1]` in XPath means *first*. If you want to handle *second* (*visible*) link, you should use `//li/a[contains(text(), "Next")])[2]`. But as I've said search by link text is more reliable way as it will ignore all hidden links... – Andersson Sep 18 '18 at 04:45
  • youre wrong, only `[2]` has a parent with `hidden`, so how come `(//li/a[contains(text(), "Next")])[1]` doesn't work?? – oldboy Sep 19 '18 at 02:55
  • `(//li/a[contains(text(), "Next")])[1]` has no problem finding the link, but for some reason it cannot scroll it into view... – oldboy Sep 19 '18 at 03:01
  • Have no idea why you still disagree. First link is obviously hidden, second one - visible. If you don't believe me, try `print(dr.find_element_by_xpath('(//li/a[contains(text(), "Next")])[1]').is_displayed())` and `print(dr.find_element_by_xpath('(//li/a[contains(text(), "Next")])[2]').is_displayed())` – Andersson Sep 19 '18 at 04:36
  • i'm disagreeing because according to the source code of the website it's `[2]` whose parent is `hidden` whereas `[1]` is visible. have you looked at the source code?? – oldboy Sep 19 '18 at 04:57
  • Yeah. I did it couple of times :) First is still hidden – Andersson Sep 19 '18 at 04:59
  • i just looped thru it. `[1]` is hidden only on the initial page, then it switches for the rest of it. but it keeps failing on page 10 for some reason... – oldboy Sep 19 '18 at 05:03
  • `[1] False [2] True | [1] True [2] False | [1] True [2] False | [1] True [2] False | [1] True [2] False | [1] True [2] False | [1] True [2] False | [1] True [2] False | [1] True [2] False` – oldboy Sep 19 '18 at 05:03
  • But you need generic solution that should work on *each* page *including initial* where first link is hidden, right? – Andersson Sep 19 '18 at 05:05
  • Try this also `ff.find_element_by_link_text('Next →')`. I'm not sure about that `→` as it's not an ASCII char. But search by full link text should avoid matching links that contains "Next" substring – Andersson Sep 19 '18 at 05:08
  • ok. for some reason it keeps failing on page 10... i cant seem to figure out why this is happening – oldboy Sep 19 '18 at 05:10
  • Yeah, it should be exact character as it appears on page. It might be incorrectly interpreted when copying from comment... – Andersson Sep 19 '18 at 05:15
  • ive tried it doesnt work. why does the script keep breaking on page 10? – oldboy Sep 19 '18 at 05:18
  • Which code line? What is the Exception? Also try to copy `Next →` directly from page into your Python GUI (e.g. Pycharm) – Andersson Sep 19 '18 at 05:26
  • `NoSuchElementException: Unable to locate element: Next` from `ff.find_elements_by_partial_link_text('Next')`. ill have to try the PyCharm thing tomo, im just going to bed – oldboy Sep 19 '18 at 05:51
  • Try to call `ff.implicitly_wait(10)` once in the top of script and then `ff.find_element_by_link_text('Next→').click()`. Works fine for me on all the pages – Andersson Sep 19 '18 at 06:53
  • fuck i cant believe it. i removed my implicit wait just for testing and of course it turns out to be that... lol – oldboy Sep 20 '18 at 01:46