xpath expression to select href value from link

Question

i have such HTML code

<a class="cat" href="/Home/txtdata0/">txtdata0</a>
<a class="cat" href="/Home/txtdata1/">txtdata1</a>
<a class="cat" href="/Home/txtdata2/">txtdata2</a>
<a class="cat" href="/Home/txtdata3/">txtdata3</a>

to access to all text of the link i use such XPATH(as in Visual Studio in C#)

.//a[@class=\"cat\"]

to access to all href value of the link i use such XPATH(as in Visual Studio in C#)

.//a[@class=\"cat\"]/@href

Google Chrome Xpath Helper show (.//a[@class="cat"] and .//a[@class="cat"]/@href)both results correct

txtdata0
txtdata1
txtdata2
txtdata3

and

/Home/txtdata0/
/Home/txtdata1/
/Home/txtdata2/
/Home/txtdata3/

Visual Studio with such Xpath .//a[@class=\"cat\"] show:

txtdata0
txtdata1
txtdata2
txtdata3

and with such Xpath .//a[@class=\"cat\"]/@href show:

txtdata0
txtdata1
txtdata2
txtdata3

Why second output same as first output?

program code

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(seturl);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
    Stream receiveStream = response.GetResponseStream();
    StreamReader readStream = null;

    if (response.CharacterSet == null)
    {
        readStream = new StreamReader(receiveStream);
    }
    else
    {
        readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
    }

    data = readStream.ReadToEnd();

    response.Close();
    readStream.Close();
}

doc.LoadHtml(data);

HtmlAgilityPack.HtmlNodeCollection bodynode = doc.DocumentNode.SelectNodes(".//a[@class=\"cat\"]");
HtmlAgilityPack.HtmlNodeCollection bodynod = doc.DocumentNode.SelectNodes(".//a[@class=\"cat\"]/@href");
MessageBox.Show(bodynode.Count.ToString());
MessageBox.Show(bodynod.Count.ToString());

for (int i = 0; i < bodynode.Count; i++)
{
    MessageBox.Show(bodynode[i].InnerText.ToString() + " - " + bodynod[i].InnerText.ToString());
}

Looks like HTML Agility Pack cannot match attribute nodes, and returns their parent elements instead. You will need an expert to confirm that, though. — Frédéric Hamidi, Mar 05 '15 at 13:35
[Attributes do not have an innerText, only a Value](http://stackoverflow.com/questions/8666902/get-a-value-of-an-attribute-by-xpath-and-htmlagilitypack#comment10772717_8666902). — Mathias Müller, Mar 05 '15 at 13:47
Thanks Mathias, i added bodynod[i].Attributes["href"].Value.ToString() and it works! — Vytas P., Mar 05 '15 at 13:54

score 2 · Accepted Answer · answered Mar 05 '15 at 13:43

2

if I remember correctly HAP attributes can be extracted like this

 string _tmpUrl = documentUrl.DocumentNode.SelectNodes("//a[@class='cat']")[i].Attributes["href"].Value;

answered Mar 05 '15 at 13:43

Helmer

439
4
14

xpath expression to select href value from link

1 Answers1