1

I have a XML (example) file: test.xml

<root>
   <tag1>AAA</tag1>
   <tag2>BBB</tag2>
   <tag3>
      <tag4>DDD</tag4>
   </tag3>
</root>

The result I want to achieve is, set two variables (from input): i.e.:

my $xpath = '/root/tag3/tag4';   # or '/root/tag2/tag5' or '/root/tag6'
my $xvalue = 'CCC';              # or 'EEE'

The script would check the $xpath variable, if it exists in the XML file, then it changes the text of it. If it doesn't exist in the XML file, then it creates the element with $xpath and $xvalue.

I use below script to set the text for $xpath, but how to modify it so that it would do proper things based on the $xpath existence? Thanks a lot,

open( my $output, '>', "$ofile") or die "cannot create $ofile: $!";
XML::Twig->new( twig_roots => { "$xpath" =>
                               sub { my $text= $_->text();
                                     $_->set_text($xvalue);
                                     $_->flush;
                                   },
                             },
            twig_print_outside_roots => $output,
            pretty_print => 'indented',
          )
          ->parsefile( "test.xml" );
toolic
  • 57,801
  • 17
  • 75
  • 117
dellair
  • 427
  • 4
  • 22

1 Answers1

4

It's a fairly simple task using a recursive subroutine

In the program below, each call to add_xpath advances the value of $node and removes one step from the XPath expression in the $path parameter

  • If the path begins with a slash and a tag name then the tag name is checked to make sure it matches the name of the root element. Then the current node is set to the root element and the subroutine recurses

  • If the path starts immediately with a tag name, then has_child is called to see if a child of that name already exists. If not then insert_new_elt adds one for us. The current node is set to the new or pre-existing child node and the subroutine recurses

  • Otherwise the path should be empty, and it is checked to make sure. Then set_text is called to set the text contents of the currenty node and the recursion terminates

The output show the resulting XML structure after each of the three operations that you show in your question

use strict;
use warnings;

use XML::Twig;
use Carp;

my $twig = XML::Twig->new;
$twig->parsefile('test.xml');
$twig->set_pretty_print('indented');
print $twig->sprint, "\n";

add_xpath($twig->root, '/root/tag3/tag4', 'CCC');
print $twig->sprint, "\n";

add_xpath($twig->root, '/root/tag2/tag5', 'EEE');
print $twig->sprint, "\n";

add_xpath($twig->root, '/root/tag6', 'GGG');
print $twig->sprint, "\n";

sub add_xpath {
    my ($node, $path, $value) = @_;

    if ( $path =~ s|^/(\w+)/?|| ) {
        my $tag = $1;
        $node = $node->root;
        carp "Root element has wrong tag name" unless $node->tag eq $tag;
    }
    elsif ( $path =~ s|^(\w+)/?|| ) {
        my $tag = $1;
        if ( my $child = $node->has_child($tag) ) {
            $node = $child;
        }
        else {
            $node = $node->insert_new_elt('last_child', $tag);
        }
    }
    else {
        carp qq{Invalid path at "$path"} if $path =~ /\S/;
        $node->set_text($value);
        return 1;
    }

    add_xpath($node, $path, $value);
}

output

<root>
  <tag1>AAA</tag1>
  <tag2>BBB</tag2>
  <tag3>
    <tag4>DDD</tag4>
  </tag3>
</root>

<root>
  <tag1>AAA</tag1>
  <tag2>BBB</tag2>
  <tag3>
    <tag4>CCC</tag4>
  </tag3>
</root>

<root>
  <tag1>AAA</tag1>
  <tag2>BBB<tag5>EEE</tag5></tag2>
  <tag3>
    <tag4>CCC</tag4>
  </tag3>
</root>

<root>
  <tag1>AAA</tag1>
  <tag2>BBB<tag5>EEE</tag5></tag2>
  <tag3>
    <tag4>CCC</tag4>
  </tag3>
  <tag6>GGG</tag6>
</root>
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • Totally working. I added the print to file to save it to a file. The methodology is very nice. – dellair Sep 01 '15 at 20:47
  • @dellair: I'm happy to help. I was inspired by [this solution](http://stackoverflow.com/a/509340/622310) to a similar problem using C#, but `XML::Twig` makes it difficult because it won't give you access to the root node (the root *element* is different -- it's the only child of the root *node*, which has no name) so I had to treat the first step after after an initial `/` as a special case. – Borodin Sep 01 '15 at 22:07