1

For a project for my c++ class, I am supposed to parse and xml file and build a binary tree from it. The file is much more dense than this but the layout is as follows:

<?xml version="1.0" encoding="utf-8"?>
<MyJournal>
    <species>
        <name>Sea Creature</name>
        <species>
            <name>Fish</name>
            <species>
                <name>swordfish</name>
            </species>
            <species>
                <name>grouper</name>
            </species>
        </species>
        <species>
            <name>Mammal</name>
            <species>
                <name>dolphin</name>
            </species>
            <species>
                <name>whale</name>
            </species>
        </species>
    </species>
    <species>
        <name>Land animal</name>
        <species>
            <name>Mammal</name>
            <species>
                <name>dog</name>
            </species>
            <species>
                <name>cat</name>
            </species>
        </species>
        <species>
            <name>Bird</name>
            <species>
                <name>blue jay</name>
            </species>
            <species>
                <name>robin</name>
            </species>
        </species>
    </species>
</MyJournal>

I'm having a hard time figuring out how to parse this data so that I can build a tree. I was thinking I could use recursion for each branch but I can only get it to get one child. Someone hinted at using a queue to put the data into a tree structure, but I'm not quite sure how I could go through all the levels of the tree using a queue. I feel like recursion is the easiest way to parse the data for each branch, but I just can't figure out how to properly implement a recursive method. Here is the method I tried using. I passed in the root node first:

void loop(xml_node<> *species)
{
    Node t1 = *new Node();
    xml_node<> * name_node = species->first_node("name");
    if(name_node != 0)
    {
        t1.setName(name_node->value());
        cout << name_node->value() << endl;
    }


    xml_node<> * child = species->first_node("species");
    if(child != 0)
    {
        cout << child->first_node("name")->value() << endl;
        if(child->first_node()->next_sibling() != 0)
        {
            loop(child->first_node()->next_sibling());
            xml_node<> * child2 = child->next_sibling();
            cout << child2->first_node()->value() << endl;
            loop(child2->first_node()->next_sibling());
        }
    }

}

It only goes through the first child of each node returning Sea Creature Fish swordfish Land animal Mammal dog

I would really appreciate any pointers in the right direction. Thanks!

  • Land animal is not a species. Use [taxonomic ranks](http://en.wikipedia.org/wiki/Taxonomic_rank) :) – David Mar 07 '13 at 17:07
  • Ya I actually am doing this project for a huge phylogenetic tree and I just dumbed it down to a small basic tree just to get the idea of parsing :) –  Mar 08 '13 at 17:41

2 Answers2

2

To cover all the nodes in this file, you need to look at each node's children and its siblings. You seem to understand that.

Your recursion approach is a workable choice for getting to the children. It seems to be working. Each recursive call to loop goes one level deeper into the children. (Whoever told you to use a "queue" possibly meant "stack" ... and recursion uses a stack implicitly. The call stack.)

It's the siblings that are being missed. And since recursion is being used to go deeper into the XML tree, this probably isn't going to be solved with more recursion.

Look at this code part of your code:

xml_node<> * child = species->first_node("species");
if(child != 0)
{
    cout << child->first_node("name")->value() << endl;

This is where you are finding the first sibling, such as "swordfish".

Try changing that if statement into a loop, so the contained logic is performed on all the siblings instead of just the first one.

Drew Dormann
  • 59,987
  • 13
  • 123
  • 180
  • Thank you for your help. I changed the if statement to this `for(xml_node<> * child = species->first_node("species"); child != 0; child = child->next_sibling())` and I got the output `Sea Creature Fish swordfish grouper Land animal Mammal dog cat Land animal Mammal dog cat`. The loop is going through both children of the fish nodes and through the mammal nodes twice now. Any idea how I should fix this? –  Mar 07 '13 at 22:04
0

I know this question has been answered already but I just want to throw in a helpful hint.

Instead of using recursion, you can use stacks/queues to parse through this XML file to build your "tree" structure.

If you want to stick with recursion, make sure all of the children point to the same parent, otherwise the tree structure will be wrong.