0

I am looking for an easy to use html parser library. Currently I am trying to setup libxml2 but am running into frustrating problems. The IDE I am using is Pelles C, I took the windows files for libxml2 and put them in the appropriate folders (headers in the correct header area, binaries in bin, libs in libraries etc.) but still whenever I try to compile a program the compiler just tells me that every libxml2 function I call is undefined. For example:

Linker Flags:

-subsystem:console -machine:amd64 kernel32.lib advapi32.lib delayimp64.lib Ws2_32.lib libxml2.lib

Code:

static void print_element_names(xmlNode * a_node)
{
xmlNode *cur_node = NULL;

for(cur_node = a_node; cur_node; cur_node = cur_node->next) 
{
    if (cur_node->type == XML_ELEMENT_NODE) 
    {
        printf("node type: Element, name: %s\n", cur_node->name);
    }
    print_element_names(cur_node->children);
}
}

int main(void)
{
xmlDoc *doc = NULL;
xmlNode *root_element = NULL;

LIBXML_TEST_VERSION
doc = xmlReadFile("XMLFILE"/*XML_FILE PUT HERE*/, NULL, 0);
if (doc != NULL) printf("error: could not parse file");
root_element = xmlDocGetRootElement(doc);
print_element_names(root_element);
xmlFreeDoc(doc);
xmlCleanupParser();

return 0;
}

just gives me the following errors when trying to compile:

POLINK: error: Unresolved external symbol 'xmlCheckVersion'.
POLINK: error: Unresolved external symbol 'xmlReadFile'.
POLINK: error: Unresolved external symbol 'xmlDocGetRootElement'.
POLINK: error: Unresolved external symbol 'xmlFreeDoc'.
POLINK: error: Unresolved external symbol 'xmlCleanupParser'.
POLINK: fatal error: 5 unresolved external(s).

this whole situation is driving me insane, if anybody could help me resolve this issue or perhaps suggest an easier to setup html parser I would immensely appreciate it.

Keith Miller
  • 1,337
  • 1
  • 16
  • 32

3 Answers3

1

Those errors are related to the linking stage: whatever library you used would give you the same issues.

Unless you did install a wrong package (e.g. 64 bits library instead of 32, or vice versa).

For XML parsing, libxml2 is quite a useful tool, it is pretty fast and quite powerful. Seeing as how you've already started with that, I'd try to solve the linker problems instead.

LSerni
  • 55,617
  • 10
  • 65
  • 107
  • the linking stage meaning what? I'm somewhat confused as to what could be causing this issue then, I'm on a 64bit machine, running the 64bit Pelles C ide? So if it's a linking stage how could I resolve that? I feel completely in the dark at this point. – Keith Miller Aug 21 '12 at 22:54
  • The Pelles IDE will first convert the C source into object code, and apparently does this without errors. Then it will try to resolve the references by looking into available libraries -- and there's where it's failing. With GCC I sometimes have to specify libraries in a different order; try that, maybe it's the same with Pelles. Otherwise, I'd check if the binary lib file is correct for the architecture (BTW, shouldn't it be a DLL?). Sorry, I don't know what else to try. But rest assured it's no fault of libxml2 :-) – LSerni Aug 21 '12 at 22:58
  • I know for a fact that it's something I'm doing wrong but I just don't know either :P – Keith Miller Aug 21 '12 at 23:00
  • The symbol names appear to be correct and match the library (ftp://ftp.zlatkovic.com/libxml/libxml2-2.7.8.win32.zip). Try to change the linker flags and move libxml2 before the other libraries. You copied in /lib all the files in the /lib dir of the zip, did you? – LSerni Aug 22 '12 at 10:07
  • Yes I did, and I tried moving the flags around as well, I tried libxml2.lib and libxml2_a.lib neither work unfortunatly, I have a backup of all my bins, includes, and libs that came with Pelles C and a backup of the default files plus the ones I added from libxml2. – Keith Miller Aug 22 '12 at 13:24
0

I once used Mini-XML. It compiles with ANSI C compilers. http://www.minixml.org/

However you should be careful because parsing HTML is not the same as parsing XML. For instance, in HTML you can have tags without closing them. Eg:

<img src="foo.jpg">
Claudi
  • 5,224
  • 17
  • 30
0

I tried a tool called html2cxx could parse html. It can parse html and css1.0 well though has not been updated for some years.

maoyang
  • 1,067
  • 1
  • 11
  • 11