xpath on the site is constantly changing

Question

Peace. I registered a test on the amazon site. Doing a search of 11 iphone and then coming to a page of full products i choose first but its xpath

// span [contains (text (), 'Apple iPhone 11 (64GB) - Black')]

The problem is that I can use this xpath but tomorrow the xpath will be renamed because the first product is changed for example:

// span [contains (text (), 'Apple iPhone 11 Pro (64GB) - Space Gray')]

But I always choose the first product among all iphones even when the product changes? Thanks.

This is the page

https://www.amazon.co.uk/s?k=iphone+11&crid=3GCCCW0Q2Z1MQ&sprefix=iph%2Caps%2C220&ref=nb_sb_noss_2

This is the big problem with screen-scraping. You're processing data that doesn't conform to any specification or standard, you have no control over how it changes from day to day. Screen scraping is one big exercise in inspired guesswork. There is no right answer. — Michael Kay, May 10 '20 at 14:20
If you want any kind of robustness, use an API in preference to screen scraping. See for example https://stackoverflow.com/questions/1595624/amazon-products-api-looking-for-basic-overview-and-information — Michael Kay, May 10 '20 at 14:22

score 1 · Accepted Answer · answered May 10 '20 at 11:13

1

Use index and following xpath to get the first element.

(//a[@class='a-link-normal a-text-normal']/span)[1]

answered May 10 '20 at 11:13

KunduK

32,888
5
17
41

Peter · Answer 2 · 2020-05-10T11:29:20.013

You could use the class of the search item span:

//span[@class="a-size-medium a-color-base a-text-normal"]

Then if you can do:

first_iphone = driver.find_element_by_xpath('//span[@class="a-size-medium a-color-base a-text-normal"]')

Although all search items are all the same class, (in this case a-size-medium a-color-base a-text-normal) the find_element_by_xpath method will only look for the first one.

score 0 · Answer 3 · answered May 10 '20 at 11:25

Always try to find something on the page that is very unlikely to change. If the element that you're looking for doesn't have such properties, look at it's ancestors.

For example, in this case, you can see that one of this span's ancestors have cel_widget_id="MAIN-SEARCH_RESULTS" which'll most likely remain constant. So, the following xpath:

//span[@cel_widget_id="MAIN-SEARCH_RESULTS"]//h2/a/span

will give you all such titles. You can get the first index as

(//span[@cel_widget_id="MAIN-SEARCH_RESULTS"]//h2/a/span)[1]

xpath on the site is constantly changing

3 Answers3