1

I am receiving the href links on some links and others returned None value.

I have the following snippet to retrieve the first 16 items on a page:

def loop_artikelen ():
    artikelen = driver.find_elements(By.XPATH, "//*[@id='content']/div[2]/ul/li")
    artikelen_lijst = []
    for artikel in artikelen[0:15]:
        titel = artikel.find_element(By.CLASS_NAME, 'hz-Listing-title').text
        prijs = artikel.find_element(By.CLASS_NAME, 'hz-Listing-price').text
        link = artikel.find_element(By.CLASS_NAME, 'hz-Listing-coverLink').get_attribute('href')
        #if link == "None":
        #   link = artikel.find_element(By.XPATH(".//a").get_attribute('href'))
        artikel = titel, prijs, link
        artikelen_lijst.append(artikel)

The output looks like this when i print it out

('Fiets gestolen dus voor een mooi prijsje is ie van jou', '€ 400,00', None)
('Amslod middenmoter fiets', '€ 1.500,00', None)
('Batavus damesfiets', '€ 90,00', 'https://www.marktplaats.nl/v/fietsen-en-brommers/fietsen-dames-damesfietsen/m1933195519-batavus-damesfiets')
('Time edge', '€ 700,00', 'https://www.marktplaats.nl/v/fietsen-en-brommers/fietsen-racefietsen/m1933185638-time-edge')

I tried adding a time.sleep(2) between link and artikel, but it didn't work. You can also i tried something else after "#", that didn't work either.

Who can help me?

Thanks in advance

Link to site : https://www.marktplaats.nl/q/fiets/#offeredSince:Vandaag|sortBy:SORT_INDEX|sortOrder:DECREASING|

  • This question is missing details. We need to see that web page to try help you – Prophet Jan 16 '23 at 19:14
  • I'm sorry, i added the site link :) – okanmutluprogramming Jan 16 '23 at 19:16
  • 1
    Hm.. Not sure, but try to scroll each product into the view first, give it some time to be loaded and then extract it details. – Prophet Jan 16 '23 at 19:29
  • I think Python is converting the value from null to "None". Null would suggest that it did not find an "href" attribute or property for the element you are targeting. You should use more specific locators as css class names are often shared among many elements. – pcalkins Jan 16 '23 at 22:47

2 Answers2

0

This seems like a diagnostic issue, and since it seems like you're pretty good with selenium already, I'll list some more general advice for helping with these sorts of problems rather than trying to solve this specific one. Not being able to find a web element is a very common problem. Here are some things that could be wrong:

  1. The elements are not being found because your query is wrong. A lot of websites have screwy HTML because they are coded that way for whatever reason. Sometimes something that looks like a list cannot be found with a single XPath query. Also I HIGHLY recommend using CSS paths instead of XPath, almost anything that can be gotten with an XPath can be found with a CSS path and it generally yields better results.

  2. The elements are not being found because they haven't been loaded. This is because either the webpage needs to be scrolled down or just because the website hasn't finished loading. You can try increasing the sleep timer to something like 1 minute to see if that's the problem, and/or manually scroll the page down during those 60 seconds.

I would try (2) first to see if that fixes your problem, since it is so easy to do, and only takes a minute.

EDIT: Since you mention: "href links on some links and others returned None value," if selenium can't find the element, it will throw an exception. If it can't find the attribute, it will return None, so the problem might be that it can find the element, but can't find the href attribute. (in other words, it is finding the wrong elements) Your problem is almost certainly that some of the elements are not links at all. I would recommend printing out all the elements you get to confirm that they are all the ones you think they are. Also, use CSSpath instead of Xpath because that will probably solve your problem.

chenjesu
  • 734
  • 6
  • 14
0

what you said is true: "Your problem is almost certainly that some of the elements are not links at al"

So I looked deeper into the Selector and i saw the next thing: the URL where it retrieves None as value is different than the one that retrieves a value. But the difference is very small:

<a class="hz-Link hz-Link--block hz-Listing-coverLink" href="/v/spelcomputers-en-games/games-nintendo-2ds-en-3ds/m1934291750-animal-crossing-new-leaf-2ds-3ds?correlationId=6ffb1c0a-3d23-4a00-ab3b-a16587b61dea">

Versus

<a class="hz-Link hz-Link--block hz-Listing-coverLink" tabindex="0" href="/v/spelcomputers-en-games/games-nintendo-2ds-en-3ds/m1934287204-mario-kart-7-nintendo-3ds?correlationId=6ffb1c0a-3d23-4a00-ab3b-a16587b61dea">

The second one contains tabindex="0", which is the same for all other selectors giving None value. So i tried to retieve the URL by Tabindex and I didn't quite get this right. I tried doing and if statement where if the value is None, run this line: link = artikel.find_element(By.ID, '0').get_attribute('href') This didn't get the job quite done.

So I am wondering, how can i retrieve the HTML where the element contains a tabindex=0 value.

And this is where I realised, you were wrong. I am actually not that sufficient in Selenium :D