8

When I use

SelectSingleNode("//meta[@name='keywords']")

it doesn't work, but when I use the same case that used in original document it works good:

SelectSingleNode("//meta[@name='Keywords']")

So the question is how can I set case ignoring?

user270014
  • 581
  • 3
  • 21
kseen
  • 359
  • 8
  • 56
  • 104

4 Answers4

8

If the actual value is an unknown case, I think you have to use translate. I believe it's:

SelectSingleNode("//meta[translate(@name,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')='keywords']")

This is the hack, but it's the only option in XPath 1.0 (except the opposite to upper-case).

Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
5

If you need a more comprehensive solution, you can write an extension function for the XPath processor which will perform a case insensitive comparison. It is quite a bit of code, but you only write it once.

After implementing the extension you can write your query as follows

"//meta[@name[Extensions:CaseInsensitiveComparison('Keywords')]]"

Where Extensions:CaseInsensitiveComparison is the extension function implemented in the sample below.

NOTE: this is not well tested I just threw it together for this response so the error handling etc. is non-existent!

The following is the code for the custom XSLT Context which provides one or more extension functions

using System;
using System.Xml.XPath;
using System.Xml.Xsl;
using System.Xml;
using HtmlAgilityPack;

public class XsltCustomContext : XsltContext
{
  public const string NamespaceUri = "http://XsltCustomContext";

  public XsltCustomContext()
  {
  }

  public XsltCustomContext(NameTable nt) 
    : base(nt)
  {    
  }

  public override IXsltContextFunction ResolveFunction(string prefix, string name, XPathResultType[] ArgTypes)
  {
    // Check that the function prefix is for the correct namespace
    if (this.LookupNamespace(prefix) == NamespaceUri)
    {
      // Lookup the function and return the appropriate IXsltContextFunction implementation
      switch (name)
      {
        case "CaseInsensitiveComparison":
          return CaseInsensitiveComparison.Instance;
      }
    }

    return null;
  }

  public override IXsltContextVariable ResolveVariable(string prefix, string name)
  {
    return null;
  }

  public override int CompareDocument(string baseUri, string nextbaseUri)
  {
    return 0;
  }

  public override bool PreserveWhitespace(XPathNavigator node)
  {
    return false;
  }

  public override bool Whitespace
  {
    get { return true; }
  }

  // Class implementing the XSLT Function for Case Insensitive Comparison
  class CaseInsensitiveComparison : IXsltContextFunction
  {
    private static XPathResultType[] _argTypes = new XPathResultType[] { XPathResultType.String };
    private static CaseInsensitiveComparison _instance = new CaseInsensitiveComparison();

    public static CaseInsensitiveComparison Instance
    {
      get { return _instance; }
    }      

    #region IXsltContextFunction Members

    public XPathResultType[] ArgTypes
    {
      get { return _argTypes; }
    }

    public int Maxargs
    {
      get { return 1; }
    }

    public int Minargs
    {
      get { return 1; }
    }

    public XPathResultType ReturnType
    {
      get { return XPathResultType.Boolean; }
    }

    public object Invoke(XsltContext xsltContext, object[] args, XPathNavigator navigator)
    {                
      // Perform the function of comparing the current element to the string argument
      // NOTE: You should add some error checking here.
      string text = args[0] as string;
      return string.Equals(navigator.Value, text, StringComparison.InvariantCultureIgnoreCase);        
    }
    #endregion
  }
}

You can then use the above extension function in your XPath queries, here is an example for our case

class Program
{
  static string html = "<html><meta name=\"keywords\" content=\"HTML, CSS, XML\" /></html>";

  static void Main(string[] args)
  {
    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(html);

    XPathNavigator nav = doc.CreateNavigator();

    // Create the custom context and add the namespace to the context
    XsltCustomContext ctx = new XsltCustomContext(new NameTable());
    ctx.AddNamespace("Extensions", XsltCustomContext.NamespaceUri);

    // Build the XPath query using the new function
    XPathExpression xpath = 
      XPathExpression.Compile("//meta[@name[Extensions:CaseInsensitiveComparison('Keywords')]]");

    // Set the context for the XPath expression to the custom context containing the 
    // extensions
    xpath.SetContext(ctx);

    var element = nav.SelectSingleNode(xpath);

    // Now we have the element
  }
}
Chris Taylor
  • 52,623
  • 10
  • 78
  • 89
2

This is how I do it:

HtmlNodeCollection MetaDescription = document.DocumentNode.SelectNodes("//meta[@name='description' or @name='Description' or @name='DESCRIPTION']");

string metaDescription = MetaDescription != null ? HttpUtility.HtmlDecode(MetaDescription.FirstOrDefault().Attributes["content"].Value) : string.Empty;
formatc
  • 4,261
  • 7
  • 43
  • 81
  • 1
    Your approach isn't so universal like Chris Taylor's. Chris' answer take in attention any combination of char's case. – kseen May 14 '12 at 03:07
  • 2
    @kseen I know but really, is it possible from someone to put something like "KeYwOrDs"? This are three common ways, and if someone writes meta name like that I doubt you will able to parse anything from that HTML document. This is out of box solution which requires two lines of code and works well for most cases, but it all depends on your requirement. – formatc May 14 '12 at 11:41
  • 1
    I trying keep rule "never trust user input" and I friendly advice you too. – kseen May 14 '12 at 12:21
1

Alternatively use the new Linq syntax which should support case insensitive matching:

        node = doc.DocumentNode.Descendants("meta")
            .Where(meta => meta.Attributes["name"] != null)
            .Where(meta => string.Equals(meta.Attributes["name"].Value, "keywords", StringComparison.OrdinalIgnoreCase))
            .Single();

But you have to do an ugly null check for the attributes in order to prevent a NullReferenceException...

jessehouwing
  • 106,458
  • 22
  • 256
  • 341