-1

When parsing with tinyXML or rapidXML I have such an error when "<" character is put into XML file:

Process returned -1073741819 (0xC0000005)   execution time : 2.335 s
Press any key to continue. 

Do you know how to avoid this behavior?

#include <iostream>
#include "tinyxml2.h"

using namespace std;

int main()
{
    tinyxml2::XMLDocument doc;
    doc.LoadFile("my.xml");

    tinyxml2::XMLElement* element;
    tinyxml2::XMLNode* node;

    node = doc.FirstChildElement("root") -> FirstChildElement("sample");

    while (node != NULL)
    {
        cout << "--START--" << endl;
        element = node -> FirstChildElement("field0");

        while(element != NULL)
        {
            const char* title = element -> GetText();

            if (title != NULL)
                cout << ":: " << title << endl;
            else
                cout << ":: NULL" << endl;

            element = element -> NextSiblingElement();

        }

        cout << "---END---" << endl << endl;

        node = node -> NextSibling();

    }

    return 0;

} 

my.xml file is something like that - take a look on <crash> markup or put only < character instead of <crash> - it leads to crash anyway:

<root>
<sample>
    <field0><crash>1</field0>
    <field1>2</field1>
    <field2>3</field2>
    <field3>4</field3>
    <field4>5</field4>
    <field5>6</field5>
    <field6>7</field6>
    <field7>8</field7>
</sample>
</root> 

gdb output:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000402c5e in tinyxml2::XMLNode::FirstChildElement(char const*) const ()
(gdb) where
#0  0x0000000000402c5e in tinyxml2::XMLNode::FirstChildElement(char const*) const ()
#1  0x00000000004013f7 in tinyxml2::XMLNode::FirstChildElement(char const*) ()
#2  0x0000000000401236 in main ()

I want to add that I've had the same with rapidXML library.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
neutrino
  • 31
  • 5

2 Answers2

2

The problem is yours, not theirs:-

 node = doc.FirstChildElement("root") -> FirstChildElement("sample");

If presented with an invalid XML file, doc.FirstChildElement("root") is likely to return NULL. And you then dereference it...

Neither parser can be expected to 'partially' parse invalid XML in this way. Try this instead.

 node = doc.FirstChildElement("root");
 if (node == NULL) 
    throw something;
 else
  node = node -> FirstChildElement("sample");
Roddy
  • 66,617
  • 42
  • 165
  • 277
0

Your options are:

  1. Debug TinyXML, find out why it's crashing, and fix it (if you have the source code),
  2. Find an XML validator library and verify that the XML is valid before attempting to parse it, or
  3. Find or write an XML parser that doesn't crash.
Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
  • The 2nd suggestion is the best option. Writing a stable, portable and bug-free parser is not an easy task. Re-inventing a wheel should be the last option if its not for learning purposes. – Sceptical Jule Oct 05 '14 at 21:50