2

Total noob here so I am sorry for my ignorance in advance.

Most of what I have searched and messed around with has centered around using XML::LibXML with XPath.

The problem that I have is that I am not looking to capture text between tags: I need values of the tags.

This is my XML structure

<users>
  <entry name="asd">
    <permissions>
      <role-based>
        <superuser>yes</superuser>
      </role-based>
    </permissions>
  </entry>
  <entry name="fgh">
    <permissions>
      <role-based>
        <superuser>yes</superuser>
      </role-based>
    </permissions>
    <authentication-profile>RSA Two-Factor</authentication-profile>
  </entry>
  <entry name="jkl">
    <permissions>
      <role-based>
        <superreader>yes</superreader>
      </role-based>
    </permissions>
    <authentication-profile>RSA Two-Factor</authentication-profile>
  </entry>
</users>

I am trying to grab the name attribute (without the quotes) and also determine whether this person is a superuser or superreader.

I am stuck at not being able to do much other than print off the nodes. I need to turn this into a CSV file in the structure of username; role

Borodin
  • 126,100
  • 9
  • 70
  • 144

4 Answers4

5

The easiest way to extract information from XML documents with XML::LibXML is to use the find family of methods. These methods use an XPath expression to select nodes and values from the document. The following script extracts the data you need:

use XML::LibXML;

my $doc = XML::LibXML->load_xml(location => 'so.xml');

for my $entry ($doc->findnodes('//entry')) {
    my $name = $entry->getAttribute('name');
    my $role = $entry->findvalue(
        'local-name(permissions/role-based/*[.="yes"])'
    );
    print("$name;$role\n");
}   

It prints

asd;superuser
fgh;superuser
jkl;superreader

I used the local-name XPath function to get the name of the role element.

Note that you might want to use Text::CSV to create CSV files in a more robust way.

nwellnhof
  • 32,319
  • 7
  • 89
  • 113
  • Thanks for the quick response. I will have to try this out. When I was attempting to use the getAttribute call before it was telling me that it was unknown in my package. I was using XML:LibXML, maybe I have dependency issues somewhere? – user2891632 Oct 17 '13 at 18:52
  • 2
    @user2891632, if you're still having problems, post a new question showing the actual code you're using and the errors you're getting. – friedo Oct 17 '13 at 18:53
3

Another solution with a different module, XML::Twig:

#!/usr/bin/env perl

use warnings;
use strict;
use XML::Twig;

my ($name, $role);

my $twig = XML::Twig->new(
    twig_handlers => {
        'entry' => sub { 
            $name = $_->att('name');
            if ( defined $name && defined $role ) { 
                printf qq|%s;%s\n|, $name, $role;
            }   
            map { undef $_ } ($name, $role);
        },  
        'role-based' => sub { $role = $_->first_child->tag },
    },  
)->parsefile( shift );

Run it like:

perl script.pl xmlfile

That yields:

asd;superuser
fgh;superuser
jkl;superreader
Birei
  • 35,723
  • 2
  • 77
  • 82
2

Using XML::Rules:

use XML::Rules;

print "name is_superuser is_superreader\n";
my @rules = (
  entry => sub {
    my $entry = $_[1];
    $_ ||= 'no' for @$entry{qw(superuser superreader)};
    print "$entry->{name} $entry->{superuser} $entry->{superreader}\n";
  },
  'permissions,role-based' => 'pass no content',
  'superuser,superreader' => 'content',
  _default => undef,
);

my $p = XML::Rules->new(rules => \@rules);
my $s = $p->parse(doc());

sub doc {
return <<XML;
<users>
   <entry name="asd">
       <permissions>
            <role-based>
                <superuser>yes</superuser>
            </role-based>
       </permissions>
   </entry>
   <entry name="fgh">
       <permissions>
            <role-based>
                <superuser>yes</superuser>
            </role-based>
       </permissions>
       <authentication-profile>RSA Two-Factor</authentication-profile>
   </entry>
   <entry name="jkl">
       <permissions>
            <role-based>
                <superreader>yes</superreader>
            </role-based>
       </permissions>
       <authentication-profile>RSA Two-Factor</authentication-profile>
   </entry>
</users>
XML
}

Or an optional set of rules assuming all content is 'yes' (and some other assumptions) for your key fields:

my $name;
my @rules = (
  '^entry' => sub {
    $name = $_[1]->{name};
  },
  'superuser,superreader' => sub {
    print "$name,$_[0]\n";
  },
  _default => undef,
);
runrig
  • 6,486
  • 2
  • 27
  • 44
1

I like using XML::Simple for projects like this.

For example:

use XML::Simple;

my $su = $ARGV[0];
die unless (-e $su);

my $su_xml = XMLin($su, ForceArray => [ 'entry' ]);
my $suref = $su_xml->{entry};

foreach my $key (keys %{$suref}) {
    $rb = ${$suref}{$key}->{permissions}->{'role-based'};
    foreach my $rbkey (keys %{$rb}) {
        print "$key\t$rbkey\t${$rb}{$rbkey}\n";
    }
}

prints:

fgh     superuser       yes
asd     superuser       yes
jkl     superreader     yes
David
  • 6,462
  • 2
  • 25
  • 22
  • `XML::Simple` is often too simple. For example, your script breaks if there's only one `` within ``. – Slaven Rezic Oct 17 '13 at 20:43
  • @SlavenRezic Good catch! Fortunately, `XML::Simple` is highly configurable and easily accounts for this case through the use of `ForceArray`. Solution updated. – David Oct 17 '13 at 21:45