I can't seem to swap the location of parallel nodes/subtrees within a pugixml document....?

Question

I need to re-sequence the majority of child nodes at one level within my document.

The document has a structure that looks (simplified) like this:

sheet
    table
        row
            parameters
        row
            parameters
        row
            parameters
        row
            cell
                header string
            cell
                header string
            cell
                header string
        data row A
            cell
                data
            cell
                data
            cell    
                data
        data row B
            cell
                data
            cell
                data
            cell    
                data
        data row C
            cell
                data
            cell
                data
            cell    
                data
        data row D
            cell
                data
            cell
                data
            cell    
                data
        data row E
            cell
                data
            cell
                data
            cell    
                data
        row
            parameters
        row
            parameters
        row
            parameters
        row
            parameters
        row
            parameters

I'm using pugixml now to load, parse, and traverse and access the large xml file, and I'm ultimately processing out a new sequence of the data rows. I know I'm parsing everything correctly and, looking at the resequence results, I can see that the reading and processing is correct. The resequence solution after all my optimizing and processing is a list of indicies in a revised order, like { D,A,E,C,B } for the example above. So now I need to actually resequence them into this new order and then output the resulting xml to a new file. The actual data is about 16 meg, with several hundred data element row nodes and more than a hundred data elements for each row

I've written a routine to swap two data rows, but something I'm doing is destroying the xml structural consistency during the swaps. I'm sure I don't understand the way pugi is moving nodes around and/or invalidating node handles.

I create and set aside node handles -- pugi::xml_node -- to the "table" level node, to the "header" row node, and to the "first data" row node, which in the original form above would be node "data row A". I know these handles give me correct access to the right data -- I can pause execution and look into them during the optimization and resequencing calculations and examine the rows and their siblings and see the input order.

The "header row" is always a particular child of the table, and the "first data row" is always the sibling immediately after the "header row". So I set these up when I load the file and check them for data consistency.

My understanding of node::insert_copy_before is this:

pugi:xml_node new_node_handle_in_document = parentnode.insert_copy_before( node_to_be_copied_to_child_of_parent , node_to_be_copied_nodes_next_sibling )

My understanding is that a deep recursive clone of node_to_be_copied_to_child_of_parent with all children and attributes will be inserted as the sibling immediately before node_to_be_copied_nodes_next_sibling, where both are children of parentnode.

Clearly, if node_to_be_copied_nodes_next_sibling is also the "first data row", then the node handle to the first data row may still be valid after the operation, but will no longer actually be a handle to the first data node. But will using insert_copy on the document force updates to individual node handles in the vicinity -- or not -- of the changes?

So let's look at the code I'm trying to make work:

// a method to switch data rows
bool switchDataRows( int iRow1 , int iRow2 )
{
    // temp vars
    int iloop;

    // navigate to the first row and create a handle that can move along siblings until we find the target
    pugi::xml_node xmnRow1 = m_xmnFirstDataRow;
    for ( iloop = 0 ; iloop < iRow1 ; iloop++ )
        xmnRow1 = xmnRow1.next_sibling();

    // navigate to the second row and create another handle that can move along siblings until we find the target
    pugi::xml_node xmnRow2 = m_xmnFirstDataRow;
    for ( iloop = 0 ; iloop < iRow2 ; iloop++ )
        xmnRow2 = xmnRow2.next_sibling();

    // ok.... so now get convenient handles on the the locations of the two nodes by creating handles to the nodes AFTER each
    pugi::xml_node xmnNodeAfterFirstNode = xmnRow1.next_sibling();
    pugi::xml_node xmnNodeAfterSecondNode = xmnRow2.next_sibling();

// at this point I know all the handles I've created are pointing towards the intended data.

    // now copy the second to the location before the first
    pugi::xml_node xmnNewRow2 = m_xmnTableNode.insert_copy_before( xmnRow2 , xmnNodeAfterFirstNode );

// here's where my concern begins. Does this copy do what I want it to do, moving a copy of the second target row into the position under the table node // as the child immediately before xmnNodeAfterFirstNode ? If it does, might this operation invalidate other handles to data row nodes? Are all bets off as // soon as we do an insert/copy in a list of siblings, or will handles to other nodes in that list of children remain valid?

    // now copy the first to the spot before the second
    pugi::xml_node xmnNewRow1 = m_xmnTableNode.insert_copy_before( xmnRow1 , xmnNodeAfterSecondNode );

// clearly, if other handles to data row nodes have been invalidated by the first insert_copy, then these handles aren't any good any more...

    // now delete the old rows
    bool bDidRemoveRow1 = m_xmnTableNode.remove_child( xmnRow1 );
    bool bDidRemoveRow2 = m_xmnTableNode.remove_child( xmnRow2 );

// this is my attempt to remove the original data row nodes after they've been copied to their new locations

    // we have to update the first data row!!!!!
    bool bDidRowUpdate = updateFirstDataRow();  // a routine that starts with the header row node and finds the first sibling, the first data row

// as before, if using the insert_copy methods result in many of the handles moving around, then I won't be able to base an update of the "first data row node" // handle on the "known" handle to the header data row node.

    // return the result
    return( bDidRemoveRow2 && bDidRemoveRow1 && bDidRowUpdate );
}

As I said, this destroys the structural consistency of the resulting xml. I can save it, but nothing will read it except notepad. The table ends up being somewhat garbled. If I try to use my own program to read it, the reader reports an "element mismatch" error and refuses to load it, understandably.

So I'm doing one or more things wrong. What are they?

I'll have to read it more carefully later, a lot of text! For a short comment that may help - insert_copy or append_copy don't invalidate any nodes. The only operation that invalidates node handles is remove_* (it invalidates all handles to nodes in the removed subtree since it's, ahem, removed). Changing the node does not invalidate handles to nodes in the subtree otherwise - you can think of this as std::map or std::list iterator invalidation. — zeuxcg, Aug 05 '14 at 23:17
The only obvious issue that the code has is that nodes after the items you're swapping don't necessarily exist - you can just use insert_copy_after. Everything else looks fine - also I'm really not sure how you can change the document in a way that makes pugixml output an XML with malformed structure. Can you upload the output document somewhere or e-mail it to me (http://pugixml.org/support/)? — zeuxcg, Aug 05 '14 at 23:28
Thank you. If I have a valid xml document with 300+ nodes at the "row" level, and traverse the list once, each time swapping the current node with some other node, in such a way that the entire list should wind up re-sequenced according to the revised list I started with, then pugi allows me to call document::save_file with a new file name and returns "true". That resulting xml, however, can not be re-read by pugi. The result is an error with and error message of "end mismatch". — user3307740, Aug 07 '14 at 14:18
Every node/row of interest in the tree should be guaranteed to have a next_sibling, but I will add code to verify that just in case. The xml files at the moment contain proprietary data. I will try to build non-proprietary data. Thank you again. — user3307740, Aug 07 '14 at 14:26

I can't seem to swap the location of parallel nodes/subtrees within a pugixml document....?

0 Answers0