3

Possible Duplicate:
Using an NSXMLParser to parse HTML

I have used NSXMLParser to parse xml files and RSS feeds. What i am confused about is that whether NSXMLParser is only for xml or can we use it to parse html as well. From a little searching on the Internet, i am assuming that some people use it for parsing html.

But are there any limitations or disadvantages of using NSXMLParser with html ?

Community
  • 1
  • 1
Jessica
  • 1,508
  • 15
  • 25

1 Answers1

4

If you HTML document is well formed xhtml, then it will work. As a guess, you will not be working with well formed xhtml, as it's rare in the real world.

HTML (including HTML 4 and 5) is not well formed XML and will not be successfully parsed by an XML parser.

Consider the following sample:

<HTML>
<HEAD>
<META http-equiv=content-type content="text/html; charset=UTF-8">
<TITLE>Sample Document</TITLE>
</HEAD>
<BODY>
<H1>Sample Document</h1>
<P>This document will <strong><em>fail</strong></em> as XML.
</BODY>
</HTML>

In the above document, content-type is not in quotes (<META http-equiv=content-type …), <H1> and </h1> are different cases, <P> does not have an end tag, and strong and em are not nested correctly. This is valid HTML but invalid XML.

Jeffery Thomas
  • 42,202
  • 8
  • 92
  • 117