11

I have a HTML document and I parse it with XPath. I want to get a value of the element input, but it didn't work.

My Html:

<tbody>
  <tr>
    <td>
      <input type="text" name="item" value="10743" readonly="readonly" size="10"/>
    </td>
  </tr>
</tbody>

My code:

using HtmlAgilityPack;

HtmlAgilityPack.HtmlDocument doc; 
HtmlWeb hw = new HtmlWeb();
HtmlNodeCollection node = doc.DocumentNode.SelectNodes("//input/@value");
string s=node[0].InnerText;

So I want to get the value: "10743" (and I don't mind to get another tags with the answer.)

Chani Poz
  • 1,413
  • 2
  • 21
  • 46
  • No, because I want to get the value by the `node[0].InnerText` – Chani Poz Dec 29 '11 at 10:59
  • 1
    But an attribute does not have an `InnerText`. – Oded Dec 29 '11 at 12:16
  • Yes, that my problem because I must use `InnerText` because this is in loop, But I don't mind to get the whole text include the element. ` `. But how? – Chani Poz Dec 29 '11 at 12:26
  • 1
    I guess you are not getting my point. You are selecting _attributes_. These only have a `Value`, not an `InnerText`. – Oded Dec 29 '11 at 12:27
  • Yes, I Caught it, but Is there no other way? a trick or something like that? Even get another tags that I don't need, in addition to the value. So the element will be innerText of another tag. – Chani Poz Dec 29 '11 at 12:38
  • @Chanipoz: You forgot to tell us what exactly you want to get as the result of evaluating the XPAth expression: an object, a string,..., what exactly string...? – Dimitre Novatchev Dec 29 '11 at 13:09
  • I added in the question what I exactly want. – Chani Poz Dec 29 '11 at 14:41
  • @Chanipoz: Could you, please, provide the complete C# code and the complete HTML document? From the code snippet it sems that you aren't using HtmlAgilityPack at all. – Dimitre Novatchev Dec 29 '11 at 15:17

3 Answers3

17

you can get it in .Attributes collection:

var doc = new HtmlAgilityPack.HtmlDocument();
doc.Load("file.html");
var node = doc.DocumentNode.SelectNodes("//input") [0];
var val = node.Attributes["value"].Value; //10743
Kakashi
  • 2,165
  • 14
  • 19
7

You can also directly grab the attribute if you use the HtmlNavigator.

//Load document from some html string
HtmlDocument hdoc = new HtmlDocument();
hdoc.LoadHtml(htmlContent);

//load navigator for current document
HtmlNavigator navigator = (HtmlNodeNavigator)hdoc.CreateNavigator();

//Get value with given xpath
string xpath = "//input/@value";
string val = navigator.SelectSingleNode(xpath).Value;
Pierluc SS
  • 3,138
  • 7
  • 31
  • 44
7

Update2: Here is a code example how to get values of attributes using Html Agility Pack:

http://htmlagilitypack.codeplex.com/wikipage?title=Examples

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link.Attributes["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

You obviously need to adapt this code to your needs -- for example you will not modify the attributes, but will just use att.Value .


Update: You may also look at this question:

Selecting attribute values with html Agility Pack


Your problem is most likely a default namespace problem -- search for "XPath default namespace c#" and you will find many good solutions (hint: use the overload of SelectNodes() that has an XmlNamespaceManager argument).

The following code shows what one gets for an attribute in a document in "no namespace":

using System;
using System.IO;
using System.Xml;

public class Sample
{

    public static void Main()
    {

        XmlDocument doc = new XmlDocument();
        doc.LoadXml("<input value='novel' ISBN='1-861001-57-5'>" +
                    "<title>Pride And Prejudice</title>" +
                    "</input>");

        XmlNode root = doc.DocumentElement;

        XmlNode value = doc.SelectNodes("//input/@value")[0];

        Console.WriteLine("Inner text: " + value.InnerText);
        Console.WriteLine("InnerXml: " + value.InnerXml);
        Console.WriteLine("OuterXml: " + value.OuterXml);
        Console.WriteLine("Value: " + value.Value);

    }
}

The result from running this app is:

Inner text: novel
InnerXml: novel
OuterXml: value="novel"
Value: novel

Now, for a document that is in a default namespace:

using System;
using System.IO;
using System.Xml;

public class Sample
{

    public static void Main()
    {

        XmlDocument doc = new XmlDocument();
        doc.LoadXml("<input xmlns='some:Namespace' value='novel' ISBN='1-861001-57-5'>" +
                    "<title>Pride And Prejudice</title>" +
                    "</input>");

        XmlNode root = doc.DocumentElement;

        XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
        nsmgr.AddNamespace("x", "some:Namespace");

        XmlNode value = doc.SelectNodes("//x:input/@value", nsmgr)[0];

        Console.WriteLine("Inner text: " + value.InnerText);
        Console.WriteLine("InnerXml: " + value.InnerXml);
        Console.WriteLine("OuterXml: " + value.OuterXml);
        Console.WriteLine("Value: " + value.Value);

    }
}

Running this app produces again the wanted results:

Inner text: novel
InnerXml: novel
OuterXml: value="novel"
Value: novel
Brian
  • 6,910
  • 8
  • 44
  • 82
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Thanks, but it is not the problem, my doc is Html, and another XPath doe's good, except of that - because this XPath is not right for my intention. I need to find another XPath, but I have no idea. – Chani Poz Dec 29 '11 at 14:49
  • Wasn't I was clear? anyway I added **all** my code and wrote what I want: the string: "**10743**" (value of node input) – Chani Poz Dec 29 '11 at 16:18
  • @Chanipoz: Have a look at my second update -- a code sample showing exactly how to obtain the value of an attribute using Html Agility Pack-- something you can easily adapt to your needs. – Dimitre Novatchev Dec 29 '11 at 16:30