0

I have an XML file that looks like this

<booklist>
   <book type="technical">
      <author>Book 1 author 1</author>
      <author>Book 1 author 2</author>
      <title>Book 1 title</title>
      <isbn>Book1ISBN</isbn>
   </book>
   <book type="fiction">
      <author>Book 2 author 1</author>
      <author>Book 2 author 2</author>
      <title>Book 2 title</title>
      <isbn>Book2ISBN</isbn>
   </book>
   <book type="technical">
      <author>Book 3 author 1</author>
      <author>Book 3 author 2</author>
      <author>Book 3 author 3</author>
      <title>Book 3 title</title>
      <isbn>Book3ISBN</isbn>
   </book>
</booklist>

When i put the file through a dumper - it looks like this:

#!/usr/bin/perl
use strict ;
use warnings ;
use XML::Simple ;
use Data::Dumper ;
my $book = ();

my $booklist = XMLin('book.xml_with_attrib');
print Dumper($booklist);

#foreach $book (@{$booklist->{author}} ) {
#     print $book->{title}  ;
#     print "\n";
#}

This is the Dump:

/tmp/walt $ /tmp/walt/bookparse_by_attrib.pl
$VAR1 = {
          'book' => [
                    {
                      'isbn' => 'Book1ISBN',
                      'title' => 'Book 1 title',
                      'author' => [
                                  'Book 1 author 1',
                                  'Book 1 author 2'
                                ],
                      'type' => 'technical'
                    },
                    {
                      'isbn' => 'Book2ISBN',
                      'title' => 'Book 2 title',
                      'author' => [
                                  'Book 2 author 1',
                                  'Book 2 author 2'
                                ],
                      'type' => 'fiction'
                    },
                    {
                      'isbn' => 'Book3ISBN',
                      'title' => 'Book 3 title',
                      'author' => [
                                  'Book 3 author 1',
                                  'Book 3 author 2',
                                  'Book 3 author 3'
                                ],
                      'type' => 'technical'
                     }
                   ]
        };

However when I try and print out the authors - this is what I get.

foreach $book (@{$booklist->{book}} ) {
     print $book->{author}  ;
     print "\n";
}

ARRAY(0x249a140)
ARRAY(0x249a098)
ARRAY(0x2499fc0)

How would I print out author?

capser
  • 2,442
  • 5
  • 42
  • 74

2 Answers2

5

In that data structure, the author points at an array reference. Therefore, you'll need to either iterate over the array or just dereference it before printing:

foreach $book (@{$booklist->{book}} ) {
     print "@{$book->{author}}\n";
}

However, I'd advise you to use a better XML Parsing module than XML::Simple. This is the advice of the module itself:

STATUS OF THIS MODULE

The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces. In particular, XML::LibXML is highly recommended.

The major problems with this module are the large number of options and the arbitrary ways in which these options interact - often with unexpected results.

Patches with bug fixes and documentation fixes are welcome, but new features are unlikely to be added.

Currently, if you have a record with only a single author, it will reference a scalar instead of an array. This can be adjusted with options in XML::Simple, but honestly it's not worth the effort.

Instead, I'd recommend using better modules like XML::LibXML or XML::Twig, to avoid inconsistencies in parsing:

use strict;
use warnings;

use XML::LibXML;

my $data = do {local $/; <DATA>};

my $xml = XML::LibXML->load_xml(string => $data);

for my $book ($xml->findnodes('//book')) {
    my $title = $book->findvalue('title');
    print "Title = '$title'\n";

    for my $author ($book->findnodes('author')) {
        print "   " . $author->textContent() . "\n";
    }
}

__DATA__
<booklist>
   <book type="technical">
      <author>Book 1 author 1</author>
      <title>Book 1 title</title>
      <isbn>Book1ISBN</isbn>
   </book>
   <book type="fiction">
      <author>Book 2 author 1</author>
      <author>Book 2 author 2</author>
      <title>Book 2 title</title>
      <isbn>Book2ISBN</isbn>
   </book>
   <book type="technical">
      <author>Book 3 author 1</author>
      <author>Book 3 author 2</author>
      <author>Book 3 author 3</author>
      <title>Book 3 title</title>
      <isbn>Book3ISBN</isbn>
   </book>
</booklist>

Outputs:

Title = 'Book 1 title'
   Book 1 author 1
Title = 'Book 2 title'
   Book 2 author 1
   Book 2 author 2
Title = 'Book 3 title'
   Book 3 author 1
   Book 3 author 2
   Book 3 author 3
Miller
  • 34,962
  • 4
  • 39
  • 60
  • I'll delete my answer, +1 for the well explained answer. – hwnd Jul 25 '14 at 21:49
  • 1
    oo wow! I've been saying to stay away from this module for a long time. I had no idea the module was saying this as well now! – ikegami Jul 26 '14 at 03:58
  • @hwnd - I saw your answer on Friday, and it looked good - I prefered your method because this shop does not have XML::LibXML. And the process to get a new module involves alot of permissinion and tickets with the sysadmins. I have to work with XML::Simple for now - bugs and all. – capser Jul 28 '14 at 12:19
  • @hwnd - feel freel to repost your answer - or feel free to get i touch wiht me at the listed address. – capser Jul 28 '14 at 12:29
1

As the author key is an array ref, you need to dereference those too:

foreach my $book ( @{ $booklist->{ book } } ) {
    foreach my $author ( @{ $book->{ author } } ) {
        print "$author\n";
    }
}
Leeft
  • 3,827
  • 1
  • 17
  • 25