3

Dear Perl and XML gurus

I have a task to update values inside XML file using XPath.
I use XML::LibXML library in Perl to read, alter and save XML file:

# Read XML file
my $parser = XML::LibXML->new();
my $doc = $parser->load_xml(location => $config_file);
my $root = $doc->documentElement();

# Alter nodes\attributes
foreach (keys %environment_values) {
    my @nodes = $root->findnodes($_);
    if (scalar @nodes < 1) {
        print "ERROR: element not found in $config_file by XPath: $_\n";
        die;
    } elsif (scalar @nodes > 1) {
        print "ERROR: more than 1 element (" . scalar @nodes . ") is found in $config_file by XPath: $_\n";
        die;
    }
    my $node = $nodes[0];
    if  ($node->nodeType == XML_ELEMENT_NODE) {
        $node->removeChildNodes();
        $node->appendText($environment_values{$_});
    } elsif ($node->nodeType == XML_ATTRIBUTE_NODE) {
        $node->setValue($environment_values{$_});
    } else {
        print "ERROR: unknown node type: " . $node->nodeType . "\n";
        die;
    }
}

# Save the resulting XML file
open (my $fh, '>:raw', $config_file) or die $!;
print $fh $doc->toString();
close $fh;

Although it generates a file very similar to the original one, there are still a couple of nuisances:

  1. Newline characters (line endings) are Unix-style, although original file has them in Windows-style.
  2. Space before the closing /> gets removed, e.g. <node /> becomes <node/>.

Any chance to fix these? I'm hoping to get exactly the same XML file as the original one, with only differences being attribute values I'm amending...

P.S. I really like how simple is <xmlpoke> in NAnt. But have to use Perl for this work.

Ivan
  • 9,089
  • 4
  • 61
  • 74
  • A compliant XML parser **must not** be round trip safe. (Meaning: for all valid inputs the input is exactly the same as the output.) There are multiple locations in the XML standard that require that the input stream be modified in a destructive fashion. A space before the closing `/>` on an empty element is required or forbidden, but I don't know many parsers that would remember it for serialization later. – Ven'Tatsu Aug 31 '12 at 19:05
  • Thanks, Ven'Tatsu! Interesting information... – Ivan Sep 03 '12 at 12:49

2 Answers2

3

I think the answer to newline may be in the mode you are using.

As per http://perldoc.perl.org/PerlIO.html#Defaults-and-how-to-override-them :

If the platform is MS-DOS like and normally does CRLF to "\n" translation for text files then the default layers are :

   unix crlf

(The low level "unix" layer may be replaced by a platform specific low level layer.)

Community
  • 1
  • 1
DVK
  • 126,886
  • 32
  • 213
  • 327
  • Thanks for the very useful link! Removal of :raw in my code helped with line endings, it seems like Perl itself takes care about them depending on the platform where it is running. – Ivan Sep 03 '12 at 12:43
  • @Ivan - you are welcome. If the answer was useful, feel free to accept it (checkmark next to it) and/or-upvote it (up arrow next to it). – DVK Sep 03 '12 at 17:15
2

In general you're not going to get exactly what you seem to be asking for - e.g. the distinction between single and double quotes round attribute values will be lost as well as spaces inside tags.

Best might be to read in with Perl once and write out without changes, and then run your script, and compare those two files.

barefootliam
  • 619
  • 3
  • 7
  • Yes, you are right, parser reads and writes the full XML content. So there might be differences in presentation, but XML will always validate after write, which is the main point. I guess I can live with minor changes in presentation )) – Ivan Sep 03 '12 at 12:45