I'm looking for an code example how to parse HTML by using libxml2. The official web page just contains prototype declarations and some description. I need to real code examples. XPath
and DOM
of HTMLParser module. I did a lot of googling; but I don't get any satisfactory result. I found only tutorials about xml parsing and much obj-c examples(I want to pure C).
Asked
Active
Viewed 622 times
0

Jack
- 16,276
- 55
- 159
- 284
-
HTML doesn't have to be well-formed. It is hardly practical to parse HTML with XML parser. – bioffe Jun 18 '12 at 19:09
-
1Despite its name, it can parse HTML as well. – Jack Jun 18 '12 at 19:13
-
Looking for code examples? Perhaps you have tried internet? – tbert Jun 18 '12 at 19:20
-
@Jack. It seems it can. I didn't know. – bioffe Jun 18 '12 at 19:20
-
@tbert: Yes. I am. As I have mentioned: "the official web page just contains prototype declarations and some description [...] I did a lot of googling; but I don't get any satisfactory result. I found only tutorials about xml parsing and much obj-c examples(I want to pure C)." – Jack Jun 18 '12 at 19:23
-
@bioffe, libxml has both an XML and an HTML parser. – ikegami Jun 18 '12 at 21:53
-
@bioffe, HTML does have to be well formed, but it often isn't since browsers fix a lot of errors. That said, you can recover from those errors. (In the Perl interface to libxml2, it's done by setting the `recover` option.) – ikegami Jun 18 '12 at 21:59