0

I'm trying to create an XML parser to load Collada files. Currently I'm defining a recursive function which can load each XML node defined as follows:

XMLnode* XMLparser::loadNode(std::vector<std::string> lines, unsigned int level) {  

    assert(level < lines.size());
    std::string line = trim(lines[level]);

    // if the line starts with a closing tag (</...) it means that is the closing tag of a
    // previous node, therefore, don't create another node
    if (line.rfind("</", 0) == 0) {

        // before returning, load child nodes.
        // recursion condition: process the following file's line, only if it does exist
        // (which means that the level must be < of the total number of line of the file)
        if (level < lines.size() - 1) {
            level++; XMLnode* node = loadNode(lines, level);
        }

        return nullptr;

    }


    std::string startingTag = g_startTag(line);
    std::vector<std::string> startingTagParts;


    // recursion condition: process the following file's line, only if it does exist
    // (which means that the level must be < of the total number of line of the file)
    if (level < lines.size() - 1) {
        level++; XMLnode* node = loadNode(lines, level);
    }
    
    return nullptr;

}

where XMLnode is a custom class - where all functions (including loadNode) are static - and trim is a custom function to remove left and right spaces from a string. The loadNode function accepts of course a vector of string which contains each line of the XML file. g_startTag is a function that returns the start tag of the currently processed line (for example if a line starts with <source id="Cube-mesh-colors-Col" name="Col">, it is returned the content of that starting tag, without the < and > symbols). The issue is that, whenever I try to declare a vector of strings to contain the result of a split operation on the startingTag string, after a certain amount of lines processed, the program crushes, with the following exception:

Exception not handled at 0x00007FF6CB96597F in "project_name": 0xC00000FD: Stack overflow (parameters: 0x0000000000000001, 0x0000001D92803FB8).

This is caused, I assume, by the recursion, which continously declares a new vector. Moreover, if I try to call the split function, it is generated another exception (which I suppose is similar):

Exception not handled at 0x00007FFCC8408739 (ucrtbased.dll) in "project_name": 0xC00000FD: Stack overflow (parametri: 0x0000000000000001, 0x000000C84C2B3FE8).

By the way, the split function I've defined is the following:

std::vector<std::string> split(std::string str, const char delimiter, bool removePunctuation= false) {

    std::vector<std::string> result = {};
    
    // using string streams to split the string
    std::istringstream stream(str);
    std::string word;
    while (std::getline(stream, word, delimiter)) {

        // add the new word to the vector of words
        if (removePunctuation)
            word.erase(std::remove_if(word.begin(), word.end(), ispunct), word.end());
        result.push_back(word);

    }

    return result;

}

The code I used to debug the loadNode function is simply a print statement:

std::cout << "level: " << level << "; starting tag: " << startingTag << std::endl;

and by including the split statement, it is clearly visible that the level reached is lower. When I print the result of splitting,

std::cout << "level: " << level << "; starting tag part 0: " << split(g_startTag(line), ' ')[0] << std::endl;

the same happens: a certain amount of string are correctly processed, then the program crushes.
When I include the split statement I don't explicitly declare a new vector each time in the loadNode function, but it is declered a temporary holder vector of strings in the split function.
Is there any way to fix this issue?

Thank you in advance for your help, and excuse my poor English: I'm still practising it!

Luke__
  • 227
  • 1
  • 9
  • 3
    With `loadNode(std::vector lines, /*...*/)` you do copy of full vector at each recursion... you probaly want to pass by const reference instead (reducing memory usage). – Jarod42 Aug 20 '22 at 21:39
  • Thank you, for your answer. Since I'm not quite comfortable with C++, how exactly could I do that? You mean something like loadNode(std::vector, const std::vector& startingTagParts, ...)? – Luke__ Aug 20 '22 at 21:41
  • 1
    -> `loadNode(const std::vector& lines, /*...*/)` – Jarod42 Aug 20 '22 at 21:42
  • But if I declare a vector of strings inside the function it can cause the same problem, even if I use your approach, right? How can I avoid this? Should I pass the other vectors I need by reference? I've already tried to declare the vectors I need as static private members, but it didn't quite work... – Luke__ Aug 20 '22 at 21:44
  • Another question, even the vectors I declare in outer functions are copied during recursion? – Luke__ Aug 20 '22 at 21:49
  • 1
    `loadNode` never returns something else than a `nullptr` – 463035818_is_not_an_ai Aug 20 '22 at 21:49
  • Yes, because I didn't implement the whole function yet, since I stumble on this problem – Luke__ Aug 20 '22 at 21:49
  • I tried to implement your suggestion @Jarod42, but it still doesn't work when I include the split statement in the loadNode function – Luke__ Aug 20 '22 at 21:53
  • What other code should I include? – Luke__ Aug 21 '22 at 07:47

1 Answers1

0

I've solved the issue: I've rewritten the entire class such that its functions are related to a particular XMLparser object: they are no more static. Moreover I've declared every variable which previously was declared in the loadNode function as a private class member. To further decrease the memory usage, I've set the loadNode function to accept just one parameter, an unsigned int& which sotres the recursion level in the node hierarchy (that is the currently processed line's ID), while the vector of lines which stores the content of the XML file, became a private member as well. The major change to the function is the recursion condition, which now is hierarchical. The loadNode function now looks something like this:

XMLnode* XMLparser::loadNode(unsigned int& level) {

    // the function should be executed a number of times less than the total number of lines of the file
    assert(level < lines.size());

    // get the current line
    currentLine = trim(lines[level]);

    // if the current node starts with the </... pattern (which is a closing tag), return
    // nullptr, as it is the closing of another node. Process the next node
    if (currentLine.rfind("</", 0) == 0) {
        return nullptr;
    }
    
    // process each line
    // getting the start tag of each line split into name and name-value attribute pair (without any '/' character)
    currentStartTag = g_startTag(currentLine);
    currentStartTag.erase(std::remove(currentStartTag.begin(), currentStartTag.end(), '/'), currentStartTag.end());
    currentStartTagParts = split(currentStartTag, ' ');

    // create a new node object
    XMLnode* node = new XMLnode(currentStartTagParts[0]);
    loadAttribs(currentStartTagParts, node);
    loadData(currentLine, node);

    // if the node is closed on the same line, return node
    if (std::regex_search(currentLine.cbegin(), currentLine.cend(), std::regex(CLOSE_TAG))) {
        return node;
    }
    
    // recursive passage: if the following line exists (which means that level < lines.size() - 1)
    // read and process the following lines as children nodes of the current, until the last node
    // returns an invalid node (nullptr), which means that the children nodes of the current node are
    // finished. This process is repeated for each node in the hierarhcy, thus calling loadNode on the
    // root node of the hierarchy processes the entire hierarchy.
    if (level < lines.size() - 1) {

        // process the following node as the first child node of the current
        level++;

        // process child nodes
        XMLnode* child = nullptr;
        while (child = loadNode(level)) {

            // if the loadNode function on the current child, doesn't return an invalid 
            // node, process the following one
            node->a_child(child);
            if (level < lines.size() - 1) level++;

        }

    }

    // at the end of the recursive processing, return the root node
    return node;
    
}

The code is inspired to the one, written in java, by a guy on YouTube, to demonstrate how to implement skeletal animation in OpenGL

Luke__
  • 227
  • 1
  • 9