I need to re-sequence the majority of child nodes at one level within my document.
The document has a structure that looks (simplified) like this:
sheet
table
row
parameters
row
parameters
row
parameters
row
cell
header string
cell
header string
cell
header string
data row A
cell
data
cell
data
cell
data
data row B
cell
data
cell
data
cell
data
data row C
cell
data
cell
data
cell
data
data row D
cell
data
cell
data
cell
data
data row E
cell
data
cell
data
cell
data
row
parameters
row
parameters
row
parameters
row
parameters
row
parameters
I'm using pugixml now to load, parse, and traverse and access the large xml file, and I'm ultimately processing out a new sequence of the data rows. I know I'm parsing everything correctly and, looking at the resequence results, I can see that the reading and processing is correct. The resequence solution after all my optimizing and processing is a list of indicies in a revised order, like { D,A,E,C,B } for the example above. So now I need to actually resequence them into this new order and then output the resulting xml to a new file. The actual data is about 16 meg, with several hundred data element row nodes and more than a hundred data elements for each row
I've written a routine to swap two data rows, but something I'm doing is destroying the xml structural consistency during the swaps. I'm sure I don't understand the way pugi is moving nodes around and/or invalidating node handles.
I create and set aside node handles -- pugi::xml_node -- to the "table" level node, to the "header" row node, and to the "first data" row node, which in the original form above would be node "data row A". I know these handles give me correct access to the right data -- I can pause execution and look into them during the optimization and resequencing calculations and examine the rows and their siblings and see the input order.
The "header row" is always a particular child of the table, and the "first data row" is always the sibling immediately after the "header row". So I set these up when I load the file and check them for data consistency.
My understanding of node::insert_copy_before is this:
pugi:xml_node new_node_handle_in_document = parentnode.insert_copy_before( node_to_be_copied_to_child_of_parent , node_to_be_copied_nodes_next_sibling )
My understanding is that a deep recursive clone of node_to_be_copied_to_child_of_parent with all children and attributes will be inserted as the sibling immediately before node_to_be_copied_nodes_next_sibling, where both are children of parentnode.
Clearly, if node_to_be_copied_nodes_next_sibling is also the "first data row", then the node handle to the first data row may still be valid after the operation, but will no longer actually be a handle to the first data node. But will using insert_copy on the document force updates to individual node handles in the vicinity -- or not -- of the changes?
So let's look at the code I'm trying to make work:
// a method to switch data rows
bool switchDataRows( int iRow1 , int iRow2 )
{
// temp vars
int iloop;
// navigate to the first row and create a handle that can move along siblings until we find the target
pugi::xml_node xmnRow1 = m_xmnFirstDataRow;
for ( iloop = 0 ; iloop < iRow1 ; iloop++ )
xmnRow1 = xmnRow1.next_sibling();
// navigate to the second row and create another handle that can move along siblings until we find the target
pugi::xml_node xmnRow2 = m_xmnFirstDataRow;
for ( iloop = 0 ; iloop < iRow2 ; iloop++ )
xmnRow2 = xmnRow2.next_sibling();
// ok.... so now get convenient handles on the the locations of the two nodes by creating handles to the nodes AFTER each
pugi::xml_node xmnNodeAfterFirstNode = xmnRow1.next_sibling();
pugi::xml_node xmnNodeAfterSecondNode = xmnRow2.next_sibling();
// at this point I know all the handles I've created are pointing towards the intended data.
// now copy the second to the location before the first
pugi::xml_node xmnNewRow2 = m_xmnTableNode.insert_copy_before( xmnRow2 , xmnNodeAfterFirstNode );
// here's where my concern begins. Does this copy do what I want it to do, moving a copy of the second target row into the position under the table node // as the child immediately before xmnNodeAfterFirstNode ? If it does, might this operation invalidate other handles to data row nodes? Are all bets off as // soon as we do an insert/copy in a list of siblings, or will handles to other nodes in that list of children remain valid?
// now copy the first to the spot before the second
pugi::xml_node xmnNewRow1 = m_xmnTableNode.insert_copy_before( xmnRow1 , xmnNodeAfterSecondNode );
// clearly, if other handles to data row nodes have been invalidated by the first insert_copy, then these handles aren't any good any more...
// now delete the old rows
bool bDidRemoveRow1 = m_xmnTableNode.remove_child( xmnRow1 );
bool bDidRemoveRow2 = m_xmnTableNode.remove_child( xmnRow2 );
// this is my attempt to remove the original data row nodes after they've been copied to their new locations
// we have to update the first data row!!!!!
bool bDidRowUpdate = updateFirstDataRow(); // a routine that starts with the header row node and finds the first sibling, the first data row
// as before, if using the insert_copy methods result in many of the handles moving around, then I won't be able to base an update of the "first data row node" // handle on the "known" handle to the header data row node.
// return the result
return( bDidRemoveRow2 && bDidRemoveRow1 && bDidRowUpdate );
}
As I said, this destroys the structural consistency of the resulting xml. I can save it, but nothing will read it except notepad. The table ends up being somewhat garbled. If I try to use my own program to read it, the reader reports an "element mismatch" error and refuses to load it, understandably.
So I'm doing one or more things wrong. What are they?