1

I understand there are many XPath href questions, but none suit my case or I am a beginner of it and don't know what's wrong with my code. Kindly bear with me if this is silly question.

I have this HTML structure:

<td valign="top">08-Jan-14 16:02</td>
<td valign="top"><span style="cursor:help;" title="Regulatory News Service">RNS</span></td>
<td valign="top"><a href="share-regulatory-news.asp?shareprice=BARC&amp;ArticleCode=d6rr2uxo&amp;ArticleHeadline=Blocklisting_Interim_Review" class="linkStoryHeadline rnsArticle" title="Blocklisting Interim Review">Blocklisting Interim Review</a></td>
<td valign="top">Company Announcement - General</td>

My code is:

HtmlNodeCollection cols5 = rows[i].SelectNodes(".//td[3]/a[@href]");

Stream writer to write the URL:

sw.WriteLine(cols5[j].InnerText);

The result appears to be Blocklisting Interim Review instead of the URL. Can anyone kindly look into it? I've went through XPath guide and search all over but still can't get the exact answer for my case. Any help would be much appreciated!

merv
  • 331
  • 5
  • 25

1 Answers1

1

You cannot select attribute with XPath. Select a element and then get it's href attribute. Following xpath selects from third table cell a element which has href attribute (yes, predicate just specifies that attribute should exist, it does not selects attribute):

var a = doc.DocumentNode.SelectSingleNode(".//td[3]/a[@href]");
var href = a.Attributes["href"].Value;

Returns

share-regulatory-news.asp?shareprice=BARC&ArticleCode=d6rr2uxo&ArticleHeadline=Blocklisting_Interim_Review

Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
  • Hi Sergey, thank you for guiding me. But I have no idea how to edit my code to what you said? Do you mind looking at it https://pastebin.com/X1J4uV72? I've been spending hours checking on this yet learnt nothing. :/ – merv Jan 09 '14 at 06:59
  • 1
    @Shyuan don't use `InnerText` of selected node. That will return inner text of `a` element, which is *"Blocklisting Interim Review"*. For `cols5` get attribute value, as I have shown – Sergey Berezovskiy Jan 09 '14 at 07:07
  • Can I ask you a silly question? Where do I put the above code? :( – merv Jan 09 '14 at 07:16
  • @Shyuan I already answered it, use that code for `col5` – Sergey Berezovskiy Jan 09 '14 at 07:18
  • Is this what you mean? `cols5[j].Attributes["href"].Value`? :) – merv Jan 09 '14 at 07:25
  • 1
    @Shyuan actually I didn't get why you have loop for columns there. It looks to me that you should use `SelectSingleNode` instead of `SelectNodes` and inner loop will not be needed. But yes, your last attempt should work (you can also check if `cols5[j]` is not null) – Sergey Berezovskiy Jan 09 '14 at 07:32
  • I am using loop because I have different files to go through, but with same format. :) I hope this makes sense? – merv Jan 09 '14 at 07:34