3

I am trying to get all hrefs listed in a series of html element blocks. I don't know how to refer to the href as a selector, and I know the hrefs all begin with "/wiki/".

I was wondering if there was a way to query the page for all hrefs that begin with this specific start to the href.

Tayne
  • 101
  • 3
  • 10
  • `href` is not a html element, it's an attribute within a link (a element), so you need to query for all links and then filter out those you don't want. – pavelsaman Jun 17 '21 at 09:52
  • I am querying the correct areas and running it as a for loops, however, that doesn't help me with how to actually get the link of the href. – Tayne Jun 18 '21 at 16:19

1 Answers1

4

You can do:

hrefs_of_page = page.eval_on_selector_all("a[href^='/wiki/']", "elements => elements.map(element => element.href)")

which should work for your use-case. This will lookup for all the link tags which have a href attribute which starts with /wiki. Then on the browser side JavaScript gets evaluated which maps from an array of elements to the href attribute so a string array gets returned on the Python side.

Max Schmitt
  • 2,529
  • 1
  • 12
  • 26