I'm currently using a perl script with LibXML to process a given XML file. This goes decently well, but if I have a node with both child nodes and free text, I begin to struggle. An example input would be:
<Errors>
<Error>
this node works fine
</Error>
<Error>
some text <testTag>with a node</testTag> in between
</Error>
</Errors>
Expected output:
<Errors>
<Error>
this node works fine
</Error>
<Error>
some text HELLOwith a nodeHELLO in between
</Error>
</Errors>
I tried replaceChild("HELLO", $testTagNode); to replace the nodes with a string, which I could then (if needed) process further with a simple search-replace, but I only run into the "not a blessed reference" error. (I feel like that would have been pretty dirty if it actually worked that way.)
If I try to run a simple search-replace directly on the parent node like this
$error=~s/\</HELLO/g;
it will simply never trigger (no matter if I escape the < or not), because LibXML seems to ignore every tag that I don't specifically ask for; if I try to print out the second Error it will also give me just
some text with a node in between
which is actually a very nice functionality for the rest of the file, but not in this instance.
I can however do
$error->removeChild($testTagNode);
which shows me that it actually does get found, but doesn't help me further. I could theoretically remove the node, save the content, and then just insert the content back into the parent; the problem being that it needs to be at the exact location where it was before. The only thing that I could probably do is read in the entire file as a string, let the basic search-replace run over it BEFORE feeding it into LibXML, but that could create a pretty big overhead and isn't really a nice solution.
I feel like I'm overlooking something substantial, as this looks like a pretty basic tasks to do, but I can't seem to find anything. Maybe I'm just looking in the wrong direction, and there is a completely different approach available. Any help is appreciated.