95

I have a well formed XHTML page. I want to find the destination URL of a link when I have the text that is linked.

Example

<a href="http://stackoverflow.com">programming questions site</a>
<a href="http://cnn.com">news</a>

I want an XPath expression such that if given programming questions site it will give http://stackoverflow.com and if I give it news it will give http://cnn.com.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
flybywire
  • 261,858
  • 191
  • 397
  • 503

6 Answers6

155

Should be something similar to:

//a[text()='text_i_want_to_find']/@href
Badaro
  • 3,460
  • 1
  • 19
  • 18
  • 79
    will I ever learn xpath? when I see a query it is so obvious and easy to understand... but I am never able to write one on my own – flybywire May 27 '09 at 12:18
  • 4
    @flybywire If you read this Stanford's free Introduction to Databases course has a good section on XML and XPath. – James P. Jun 28 '12 at 12:44
  • 4
    Instead of text(), you can use ".=", for example //a[.='Register here'] – danpop Feb 03 '16 at 14:31
  • 1
    What if I don't know the text? Can I select the nodes which contains `http` or certain keyword? – Alston Jul 29 '18 at 15:09
80

Too late for you, but for anyone else with the same question...

//a[contains(text(), 'programming')]/@href

Of course, 'programming' can be any text fragment.

David Moles
  • 48,006
  • 27
  • 136
  • 235
MaDeuce
  • 829
  • 6
  • 2
10
//a[text()='programming quesions site']/@href 

which basically identifies an anchor node <a> that has the text you want, and extracts the href attribute.

David Moles
  • 48,006
  • 27
  • 136
  • 235
Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
6

Think of the phrase in the square brackets as a WHERE clause in SQL.

So this query says, "select the "href" attribute (@) of an "a" tag that appears anywhere (//), but only where (the bracketed phrase) the textual contents of the "a" tag is equal to 'programming questions site'".

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Baxter Tidwell
  • 381
  • 3
  • 5
4

For case insensitive contains, use the following:

//a[contains(translate(text(),'PROGRAMMING','programming'), 'programming')]/@href

translate converts capital letters in PROGRAMMING to lower case programming.

David Moles
  • 48,006
  • 27
  • 136
  • 235
Abdo
  • 13,549
  • 10
  • 79
  • 98
  • Please don't add "thanks" as answers. Invest some time in the site and you will gain sufficient [privileges](http://stackoverflow.com/privileges) to upvote answers you like, which is the Stack Overflow way of saying thank you. – Sklivvz Jun 30 '13 at 12:07
  • 5
    "Thanks" wasn't my "answer". I was, in a way, giving credit to an answer above that I improved on. – Abdo Jul 01 '13 at 12:22
1

if you are using html agility pack use getattributeValue:

$doc2.DocumentNode.SelectNodes("//div[@class='className']/div[@class='InternalClass']/a[@class='InternalClass']").GetAttributeValue("href","")
Adi Lester
  • 24,731
  • 12
  • 95
  • 110
Miguel Vaz
  • 11
  • 1