2

I have an XML file like below,

<?xml version="1.0"?>
<data>
  <header>
    <name>V9 Red Indices</name>
    <version>9</version>
    <date>2017-03-16</date>
  </header>
  <index>
    <indexfamily>ITRAXX-Asian</indexfamily>
    <indexsubfamily>iTraxx Rest of Asia</indexsubfamily>                
    <paymentfrequency>3M</paymentfrequency>
    <recoveryrate>0.35</recoveryrate>
    <constituents>
      <constituent>
        <refentity>
          <originalconstituent>
            <referenceentity>ICICI Bank Limited</referenceentity>
            <redentitycode>Y1BDCC</redentitycode>
            <role>Issuer</role>
            <redpaircode>Y1BDCCAA9</redpaircode>
            <jurisdiction>India</jurisdiction>
            <tier>SNRFOR</tier>
            <pairiscurrent>false</pairiscurrent>
            <pairvalidfrom>2002-03-30</pairvalidfrom>
            <pairvalidto>2008-10-22</pairvalidto>
            <ticker>ICICIB</ticker>
            <ispreferred>false</ispreferred>
            <docclause>CR</docclause>
            <recorddate>2014-02-25</recorddate>
            <weight>0.0769</weight>
          </originalconstituent>
        </refentity>
        <refobligation>
          <type>Bond</type>
          <isconvert>false</isconvert>
          <isperp>false</isperp>
          <coupontype>Fixed</coupontype>
          <ccy>USD</ccy>
          <maturity>2008-10-22</maturity>
          <coupon>0.0475</coupon>
          <isin>XS0178885876</isin>
          <cusip>Y38575AQ2</cusip>
          <event>Matured</event>
          <obligationname>ICICIB 4.75 22Oct08</obligationname>
          <prospectusinfo>
            <issuers>                                                        
              <origissuersasperprosp>ICICI Bank Limited</origissuersasperprosp>
            </issuers>
          </prospectusinfo>
        </refobligation>
      </constituent>
    </constituents>
  </index>
</data>

I would like to iterate through this file without knowing the tag names. My end goal is to create a hash with tag names and values.

I do not want to use findnodes with XPath for each node. That defeats the whole purpose of writing a generic loader.

I am also using XML-LibXML-2.0126 , a little older version.

Part of my code which uses findnodes is below. The XML was also shortened to avoid a lengthy query which it has become now :)

use XML::LibXML;

my $xmldoc = $parser->parse_file( $fileName );
my $root = $xmldoc->getDocumentElement() || die( "Could not get Document Element \n" );

foreach my $index ( $root->findnodes( "index" ) ) {    # $root->getChildNodes()) # Get all the Indexes

    foreach my $constituent ( $index->findnodes( 'constituents/constituent' ) ) { # Lets pick up all Constituents

        my $referenceentity = $constituent->findnodes( 'refentity/originalconstituent/referenceentity' );    # This is a crude way. we should be iterating without knowing whats inside

        print "referenceentity :" . $referenceentity . "\n";
        print "+++++++++++++++++++++++++++++++++++ \n";
    }
}
Borodin
  • 126,100
  • 9
  • 70
  • 144
BRATVADDI
  • 199
  • 1
  • 2
  • 10

2 Answers2

1

Use the nonBlankChildNodes, nodeName and textContent methods provided by XML::LibXML::Node:

my %hash;

for my $node ( $oc->nonBlankChildNodes ) {

    my $tag = $node->nodeName;
    my $value = $node->textContent;
    $hash{$tag} = $value;
}

Which is equivalent to:

my %hash = map { $_->nodeName, $_->textContent } $oc->nonBlankChildNodes;
Zaid
  • 36,680
  • 16
  • 86
  • 155
  • Looks neat.. Although, I am using XML-LibXML-2.0126 and that does not seem to support Node. Not sure if I want to get involved into installing the newer version. is there an alternative? – BRATVADDI May 24 '17 at 13:12
  • That's a fairly recent version of `XML::LibXML`, I'm fairly certain you will have `XML::LibXML::Node`. What does `perl -MXML::LibXML::Node -e 1` on the command line return? – Zaid May 24 '17 at 13:14
  • it returns the error of not being able to find "Can't locate XML/LibXML/Node.pm in @INC (@INC contains: /app/ac/local/lib/perl5 /app/ac/lib/perl5 /app/localapps/perl/lib/sun4-solaris-64int" – BRATVADDI May 24 '17 at 13:22
  • Do you get the same message with `perl -MXML::LibXML -e 1`? – Zaid May 24 '17 at 13:25
  • Nope.that seems to run fine. doesnt give any errors.. perl -MXML::LibXML::NodeLis -e 1 ; seems to be fine too..could be a buggy installation? – BRATVADDI May 24 '17 at 13:27
  • Huh, in that case even `findnodes` would not work, since it's provided by the same package. I don't see how it would be possible to traverse the document without `XML::LibXML::Node` – Zaid May 24 '17 at 13:31
  • findnodes actually works :) Ive been using it for sometime now. Did not want to write a findnodes for every single element and hence this question. – BRATVADDI May 24 '17 at 13:34
  • I think we would need to see how you use it to make sense of what's happening here then. It could be installed in a separate path that is not part of `@INC`, perhaps? – Zaid May 24 '17 at 13:37
  • Reading `Changes` helps. See also https://rt.cpan.org/Public/Bug/Display.html?id=114638 – Sinan Ünür May 24 '17 at 14:59
  • 1
    Re "*I am using XML-LibXML-2.0126 and that does not seem to support Node.*", That's not the case at all. All versions of XML::LibXML have ::Node. It's the base class of every DOM object. (XML::LibXML::Node doesn't have it's own file, which is why `use XML::LibXML::Node;` and similar are failing. It's provided by `use XML::LibXML;`.) – ikegami May 24 '17 at 15:05
  • The confusion was trying to find XML::LibXML::Node. The solution works fine. thanks! – BRATVADDI May 26 '17 at 09:43
0

Are you sure you want this? It's just as simple to access arbitrary data from a parsed XML::LibXML::Document object as it is from a nested Perl hash. It will certainly occupy less memory space than the equivalent object, if that is your intention, but from your question it doesn't appear so

You can do this easily using the XML::Parser module, which calls a callback every time an "event" occurs in the XML data. In this case all we're interested in is an open tag, a close tag, and a text string

This example code builds a nested hash from the XML. It dies with an appropriate message if the XML data is malformed (a closing tag doesn't match the name of an opening tag) or if any of the elements has one or more attributes, which can't be represented in this structure

I've used Data::Dump to display the result

use strict;
use warnings 'all';

use XML::Parser;
use Data::Dump;

my $parser = XML::Parser->new(
    Style    => 'Debug',
    Handlers => {
        Start => \&handle_start,
        End   => \&handle_end,
        Char  => \&handle_char,
    },
);


my %data;
my @data_stack = ( \%data );
my @elem_stack;

$parser->parsefile( 'index.xml' );
dd \%data;


sub handle_start {
    my ($expat, $elem) = @_;

    my $data = $data_stack[-1]{$elem} = { };
    push @data_stack, $data;
    push @elem_stack, $elem;

    if ( @_ > 2 ) {
        my $xpath = join '', map "/$_", @elem_stack;
        die qq{Element at $xpath has attributes};
    }
}


sub handle_end {
    my ($expat, $elem) = @_;

    my $top_elem = pop @elem_stack;
    die qq{Bad XML structure $elem <=> $top_elem} unless $elem eq $top_elem;

    pop @data_stack;
}


sub handle_char {
    my ($expat, $str) = @_;

    return unless $str =~ /\S/;

    my $top_elem = $elem_stack[-1];

    $data_stack[-2]{$top_elem} = $str;
}

output

{
    data => {
        header => {
            date => "2017-03-16",
            name => "V9 Red Indices",
            version => 9,
        },
        index  => {
            constituents => {
                constituent => {
                    refentity => {
                        originalconstituent => {
                            docclause       => "CR",
                            ispreferred     => "false",
                            jurisdiction    => "India",
                            pairiscurrent   => "false",
                            pairvalidfrom   => "2002-03-30",
                            pairvalidto     => "2008-10-22",
                            recorddate      => "2014-02-25",
                            redentitycode   => "Y1BDCC",
                            redpaircode     => "Y1BDCCAA9",
                            referenceentity => "ICICI Bank Limited",
                            role            => "Issuer",
                            ticker          => "ICICIB",
                            tier            => "SNRFOR",
                            weight          => 0.0769,
                        },
                    },
                    refobligation => {
                        ccy            => "USD",
                        coupon         => 0.0475,
                        coupontype     => "Fixed",
                        cusip          => "Y38575AQ2",
                        event          => "Matured",
                        isconvert      => "false",
                        isin           => "XS0178885876",
                        isperp         => "false",
                        maturity       => "2008-10-22",
                        obligationname => "ICICIB 4.75 22Oct08",
                        prospectusinfo => {
                            issuers => {
                                origissuersasperprosp => "ICICI Bank Limited"
                            },
                        },
                        type => "Bond",
                    },
                },
            },
            indexfamily      => "ITRAXX-Asian",
            indexsubfamily   => "iTraxx Rest of Asia",
            paymentfrequency => "3M",
            recoveryrate     => 0.35,
        },
    },
}
Borodin
  • 126,100
  • 9
  • 70
  • 144