1

I am half way through writing a script using XML::Simple. I have read that is not so "simple", and even its own documentation discourages its use in new code, but I have no other choice as this script will be an extension to existing code.

What I am doing is this

  1. Get XML by reading from a URL
  2. Parse it using XML::Simple
  3. Read the required elements from the data
  4. Run different checks on these required elements

I could parse and do some checks on a few of the elements, but while reading elements that are in array, I am getting undef.

This is my code:

#!/usr/bin/perl

use strict;
use warnings;

use LWP::UserAgent;
use LWP::Simple;
use XML::Simple;
use DBI;

use Data::Dumper;

my $str = "<Actual_URL>";

my $ua = LWP::UserAgent->new;
$ua->timeout( 180 );
$ua->agent( "$0/0.1 " . $ua->agent );

my $req = HTTP::Request->new( GET => $str );

my $buffer;
$req->content_type( 'text/xml' );
$req->content( $buffer );

my $response = $ua->request( $req );

my $xml = $response->content();
print "Value of \$xml is:\n";
print $xml;

my $filename = 'record.txt';
open( my $fh, '>', $filename ) or die "Could not open file '$filename' $!";
print $fh $xml;
close $fh;

my $number_of_lines = `wc -l record.txt | cut -d' ' -f1`;
print "Number of lines in $filename are: $number_of_lines\n";
if ( $number_of_lines >= 50 ) {
    print "TEST_1 SUCCESS\n";
}

my $mysql_dbh;
my $test_id;

my $xst;
my %cmts_Pre_EQ_tags;

if ( ( not defined $xml ) or ( $xml =~ m/read\stimeout/i ) ) {
    &printXMLErr( 'DRUM request timed out' );
}
else {
    my $xs = XML::Simple->new();
    $xst = eval { $xs->XMLin( $xml, KeyAttr => 1 ) };
    &printXMLErr( $@ ) if ( $@ );
    print "Value of \$xst inside is:\n";
    print Dumper( $xst );
}

$cmts_Pre_EQ_tags{'$cmts_Pre_EQ_groupDelayMag'} =
    $xst->{cmts}->{Pre_EQ}->{groupDelayMag}->{content};

#More elements like this are checked here
$cmts_Pre_EQ_tags{'$cmts_Pre_EQ_ICFR'} =
    $xst->{cmts}->{Pre_EQ}->{ICFR}->{content};

my $decision1 = 1;
print "\%cmts_Pre_EQ_tags:\n";
foreach ( sort keys %cmts_Pre_EQ_tags ) {
    print "$_ : $cmts_Pre_EQ_tags{$_}\n";
    if ( $cmts_Pre_EQ_tags{$_} eq '' ) {
        print "$_ is empty!\n";
        $decision1 = 0;
    }
}
print "\n";

if ( $decision1 == 0 ) {
    print "TEST_2_1 FAIL\n";
}
else {
    print "TEST_2_1 SUCCESS\n";
}

my $cpeIP4 = $xst->{cmts}->{cpeIP4}->{content};
print "The cpe IP is: $cpeIP4\n";

if ( $cpeIP4 ne '' ) {
    print "TEST_2_2 SUCCESS\n";
}
else {
    print "TEST_2_2 FAIL\n";
}

# Working fine until here, but following 2 print are showing undef
print Dumper ( $xst->{cmts}{STBDSG}{dsg}[0]{dsgIfStdTunnelFilterTunnelId} );
print Dumper ( $xst->{cmts}{STBDSG}{dsg}[0]{dsgIfStdTunnelFilterClientIdType} );
print "After\n";

Output of last three print statements is:

$VAR1 = undef;
$VAR1 = undef;
After

I can't provide the entire XML or the output of print Dumper($xst) as it's too big and gets generated dynamically, but I'll provide a sample of it.

The part of the XML that is causing trouble is

<cmts>
  <STBDSG>
    <dsg>
      <dsgIfStdTunnelFilterTunnelId>1</dsgIfStdTunnelFilterTunnelId>
      <dsgIfStdTunnelFilterClientIdType>caSystemId</dsgIfStdTunnelFilterClientIdType>
    </dsg>
    <dsg>
      <dsgIfStdTunnelFilterTunnelId>2</dsgIfStdTunnelFilterTunnelId>
      <dsgIfStdTunnelFilterClientIdType>gaSystemId</dsgIfStdTunnelFilterClientIdType>
    </dsg>
  </STBDSG>
</cmts>

And when this part is parsed, then its corresponding output in $xst is

$VAR1 = {
    'cmts' => {
            'STBDSG' => {
                'dsg' => [
                         {
                           'dsgIfStdTunnelFilterTunnelId' => '1',
                           'dsgIfStdTunnelFilterClientIdType' => 'caSystemId',
                         },
                         {
                           'dsgIfStdTunnelFilterTunnelId' => '2',
                           'dsgIfStdTunnelFilterClientIdType' => 'gaSystemId',
                         }
                         ]
                     },
    },
};

The XML part where after parsing the values are fetched fine is like this

<cmts>
    <name field_name="Name">cts01nsocmo</name>
    <object field_name="Nemos Object">888</object>
    <vendor field_name="Vendor">xyz</vendor>
</cmts>

Which was converted as:

    $VAR1 = {
      'cmts' => {
        'name' => {
                    'content' => 'cts01nsocmo',
                    'field_name' => 'Name'
                  },
        'object' => {
                      'content' => '888',
                      'field_name' => 'Nemos Object'
                    },
        'vendor' => {
                      'content' => 'xyz',
                      'field_name' => 'Vendor'
                    }
         },
};

So basically when there is no array in parsed content, the values are being fetched correctly in variables.

It seems that the reason why this

print Dumper ( $xst->{cmts}{STBDSG}{dsg}[0]{dsgIfStdTunnelFilterTunnelId} );
print Dumper ( $xst->{cmts}{STBDSG}{dsg}[0]{dsgIfStdTunnelFilterClientIdType} );

is getting undef is related to setting correct values to either KeyAttr or ForceArray. I am trying to find it by reading XML::Simple, but I wanted to see if there's something distinct that I am missing here.

Borodin
  • 126,100
  • 9
  • 70
  • 144
300
  • 965
  • 1
  • 14
  • 53
  • 2
    The `$VAR1` result that you show is fine when accessed with `$xst->{cmts}{STBDSG}{dsg}[0]{dsgIfStdTunnelFilterTunnelId}` etc. The values `1` and `caSystemId` are displayed. You need to show what form the parsed data takes when you are getting `undef` for those values – Borodin Jul 07 '15 at 19:50
  • 2
    Can I suggest using an alternative XML parser for consistency and convenience, and building a data structure that *looks* like it came from `XML::Simple` if you need to when you are interacting with the rest of the project. That would be very a very simple thing to achieve – Borodin Jul 07 '15 at 20:06
  • XML::Twig has `simplify` built in IIRC. – Sobrique Jul 07 '15 at 20:09
  • 1
    @Sobrique: That just occurred to me as well. I have written it up – Borodin Jul 07 '15 at 20:16

2 Answers2

4

It's worth considering the use of XML::Twig, regardless of what the rest of your project does

In particular, XML::Twig::Elt objects -- the module's implementation of XML elements -- have a simplify method, whose documentation says this

Return a data structure suspiciously similar to XML::Simple's. Options are identical to XMLin options

So you can use XML::Twig for its precision and convenience, and apply the simplify method if you need to pass on any data that looks like an XML::Simple data structure

Borodin
  • 126,100
  • 9
  • 70
  • 144
  • Thank you Borodin and Sobrique for all the suggestions. I think I'll have to go for XML::Twig if i want to complete this work. The problem is that the machine I am working on has no CPAN to help me get new modules (even if I manage to overcome the code dependency). When I tried to get CPAN using yum, I found that somehow system is showing not registered with RHN. But regardless, I'll either install XML::Twig manually or find some way to get XML::Twig as that seems to be the only option available now. – 300 Jul 07 '15 at 20:37
  • 1
    @ user3209087: It's very simple to install a module by hand. Take a look at [perlmodinstall](http://perldoc.perl.org/perlmodinstall.html). You can download the package on any machine and copy it over on a flash drive. Then the four simple commands in that document will install it for you. You may have to also install any dependencies, but the test phase will highlight those for you – Borodin Jul 07 '15 at 20:54
1

As you have found - XML::Simple, isn't. Even it's documentation suggests:

The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces.

Part of the problem is - XML doesn't have any such thing as arrays. It might have duplicated tags. But as such - there is no linear mapping between 'array' and 'XML' so it always makes the programming uncomfortable.

What it's doing to you is assuming that the dsg elements are an array, and casting them automatically.

Anyway, I would suggest using XML::Twig instead - and then your 'print' statements just look like this:

#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;

my $twig = XML::Twig->new->parse( \*DATA );

foreach my $element ( $twig->get_xpath( "cmts/STBDSG/dsg", 0 ) ) {
    print $element ->first_child_text("dsgIfStdTunnelFilterTunnelId"), "\n";
    print $element ->first_child_text("dsgIfStdTunnelFilterClientIdType"),
        "\n";
}

Anyway, if you're forced into using XML::Simple - and throwing it away and starting over isn't an option. (Because seriously, I'd consider it!).

What XML::Simple does with 'matching' elements is try and pretend they're arrays.

If there aren't matching elements, it treats them as a hash. That's probably what's catching you out. The problem is - in perl, hashes can't have duplicate keys - so your example, dsg - rather than duplicating it, it array-ifys it.

Switching on ForceArray puts everything into arrays, but some of the arrays might be single elements. That's useful if you want consistency though.

KeyAttr probably doesn't help you - that's primarily geared to having different subelements and you wanting to 'map' them. It allows you to turn one of the XML attributes into the 'key' field in a hash.

E.g.

<element name="firstelement">content</element>
<element name="secondelement">morecontent</element>

If you specify KeyAttr as name it will make a hash with keys of firstelement and secondelement.

As your dsg doesn't have this, then that's not what you want.

To iterate upon dsg:

foreach my $element ( @{ $xst->{cmts}{STBDSG}{dsg} } ) {
    print $element ->{dsgIfStdTunnelFilterTunnelId},     "\n";
    print $element ->{dsgIfStdTunnelFilterClientIdType}, "\n";
}
Sobrique
  • 52,974
  • 7
  • 60
  • 101