5

I am using the HTML Agility Pack to great effect, and am really impressed with it - However, I am selecting content like so

doc.DocumentNode.SelectSingleNode("//body").InnerHtml

How to I deal with the following situation, with different documents?

<body>
<Body>
<BODY>

Will my code above only get the lower case versions?

YodasMyDad
  • 9,248
  • 24
  • 76
  • 121
  • 1
    What have you tried? Looks like something that can be tested in a couple of minutes. – Oded Apr 25 '11 at 07:47

1 Answers1

20

The Html Agility Pack handles HTML in a case insensitive way. It means it will parse BODY, Body and body the same way. It's by design since HTML is not case sensitive (XHTML is).

That said, when you use its XPATH feature, you must use tags written in lower case. It means the "//body" expression will match BODY, Body and body, and "//BODY" will match nothing.

Simon Mourier
  • 132,049
  • 21
  • 248
  • 298
  • @Mark - I actually am the author :-) I did that because XPATH is case sensitive (and the translate function is just so unpractical!) and HTML is not. I don't see any other nice solution? – Simon Mourier Apr 25 '11 at 16:04
  • I dunno...just lowercase everything if the document is HTML? But leave it if it's XHTML or XML? It's not such a big deal, just something to be aware of I guess :) – mpen Apr 25 '11 at 23:59
  • @Mark - The Html Agility Pack was designed for HTML, not for X(HT)ML, where you can safely use standard .NET classes. So for HTML, yes, it sort of "lowercase" everything, exactly. – Simon Mourier Apr 26 '11 at 05:52
  • 2
    Simon thanks for the clarification, got to say its an excellent library - Top job :) – YodasMyDad Apr 27 '11 at 19:06