Force perl XMLin to treat an empty tag as empty string?

Question

Having the following:

perl -MXML::LibXML::Simple -MData::Dumper -E '$h=XMLin("<some><bubu>string</bubu></some>");say Dumper $h'

is parsed as:

$VAR1 = {
          'bubu' => 'string'
        };

but the

perl -MXML::LibXML::Simple -MData::Dumper -E '$h=XMLin("<some><bubu/></some>"); say Dumper $h'

or

perl -MXML::LibXML::Simple -MData::Dumper -E '$h=XMLin("<some><bubu></bubu></some>");say Dumper $h'

prints:

$VAR1 = {
          'bubu' => {}
        };

Is possible to get

$VAR1 = {
          'bubu' => ""
        };

To be consistent with other string values?

The real code behind the question is like:

package Something {
    use Moose;
    has 'bar' => (is => 'ro', isa => 'Str');
    has 'baz' => (is => 'ro', isa => 'Str');
}

use 5.014;
use warnings;
use XML::LibXML::Simple;

my $xml = do {local $/, <DATA>};
my $hr = XMLin($xml);
for my $node( @{$hr->{node}} ) {
    my $obj = Something->new($node);
}
__DATA__
<root>
<node>
    <bar>bar1</bar>
    <baz>baz1</baz>
</node>
<node>
    <bar>bar2</bar>
    <baz/>
</node>
</root>

which dies with

Attribute (baz) does not pass the type constraint because: Validation failed for 'Str' with value HASH(0x7f91a4a92450) at /opt/anyenv/envs/plenv/versions/5.24.0/lib/perl5/site_perl/5.24.0/darwin-2level/Moose/Object.pm line 24
    Moose::Object::new('Something', 'HASH(0x7f91a3f3d430)') called at l line 28

therefore i need either

treat the empty baz not as {} but as ''
or adding some coercion to the package Something to coerce any empty hashrefs {} to empty strings ''.

Any idea for the easy way?

EDIT

So, the result. The accepted answer remains because it gives the answer to the above question.

But must say, after 3 days of studying the new (very complex) module XML::Twig + learning XPath's basics i got a solution which is more clear and nicer as the XMLIn solution.

In the XMLIn i needed reorganize the got hashref, because wanted only few elements and with exactly defined structure. (acceptable for an constructor). Such reorganizing (deleting unwanted members, moving deeper hashref's values to arrayrefs and such) is easy in perl, but the code isn't nice and needed cope with questions like above.

Using the XML::Twig (and 2 followup questions here) the result was much cleaner and much more readable and much shorter as the XMLIn. Really is better sacrifice some time and learn (at least the basics) of the XPath and such...

The easy way is: [Don't use `XML::Simple`](http://stackoverflow.com/questions/33267765/why-is-xmlsimple-discouraged) - there are much better alternatives available. — Sobrique, May 15 '16 at 14:54

score 2 · Accepted Answer · answered May 15 '16 at 09:30

XML::LibXML::Simple does not appear to have an option to enable this behavior.

XML::Simple does, though; set SuppressEmpty to an empty string to parse empty nodes as strings instead of containers:

# perl -MXML::Simple -MData::Dumper \
  -E '$h=XMLin("<some><bubu></bubu></some>", SuppressEmpty => ""); say Dumper $h'

$VAR1 = {
      'bubu' => ''
    };

score 1 · Answer 2 · answered May 15 '16 at 09:15

You can use Data::Find module to traverse the hash and and find paths to empty hash refs. You can then use eval to replace empty hash refs with empty string. Here is an example:

use strict;
use warnings;

use XML::LibXML::Simple;
use Data::Dumper;
use Data::Find qw/ diter /;

my $xml = <<XML;
<root>
<node>
    <bar>bar1</bar>
    <baz>baz1</baz>
</node>
<node>
    <bar>bar2</bar>
    <baz/>
</node>
</root>
XML

my $h = XMLin($xml);

my $iter = diter $h, sub {
    my $v = shift;

    defined $v and ref($v) eq "HASH" and !(keys %{ $v });
};

while (my $path = $iter->() )
{
    eval "\$h->$path = ''";
}

print Dumper($h);

You can, but given you had XML::LibXML in there, why not just `findnodes` instead? — Sobrique, May 15 '16 at 20:27

score 1 · Answer 3 · edited May 23 '17 at 12:16

1

First off: Why is XML::Simple "Discouraged"?

XML::Simple doesn't make it easier, it makes it harder. I would advocate XML::Twig or XML::LibXML instead. In XML::Twig getting the 'value' of a node is exactly as you expect:

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig;
my $twig = XML::Twig -> parse ( \*DATA ); 

foreach my $node ( $twig -> findnodes('//node/*') ) {
    print $node -> tag, " => \"", $node -> text,"\"\n";
}

__DATA__
<root>
<node>
    <bar>bar1</bar>
    <baz>baz1</baz>
</node>
<node>
    <bar>bar2</bar>
    <baz/>
</node>
</root>

Gives:

bar => "bar1"
baz => "baz1"
bar => "bar2"
baz => ""

Which you could pass to your constructor.

edited May 23 '17 at 12:16

Community

1
1

answered May 15 '16 at 15:10

Sobrique

52,974
7
60
101

wow - just saw the answer. Very nice code. Probably this is the right way :) Going to try implement it, but my real XML is much-much more complex, and to the constructor i need pass approx 20 different values (some of are arrayrefs and such.) I'm not sure how i will be successful to write the above searches. Anyway - thank you very much. :) – cajwine May 15 '16 at 19:27
Guarantee you, your more complicated code is easier to navigate via xpath. And there's quite a few that would be prepared to help figure it out. – Sobrique May 15 '16 at 20:25
You may find this [XPath Sandbox](http://grantm.github.io/perl-libxml-by-example/_static/xpath-sandbox/xpath-sandbox.html?q=//title) and the tutorial it links to useful for learning XPath. – Grant McLean May 15 '16 at 21:44
Few warning words for others, from a point of view of an XML and perl beginner as me. Using the discouraged `XML::Simple` I was able process my XML file using approx 20 lines of perl code. (1 line for calling the `XMLIn` + approx 20 lines of perl code for filtering and modifying the got nested hash). Time spent: approx 2 hours (to understand the complicated XML structure and write the mentioned 20 lines of code) **plus** another few hours waiting for the answer here. :) Result: _working solution_. – cajwine May 17 '16 at 11:51
Now, trying to understand the recommented `XML::Twig`. The module is sure great and probably allows solve any XML-related problem, but the docs are written _by experts_ and **for experts**. For an beginner like me is **very hard** to progress. After spending approx 12 horus of reading docs + trying to fetch the needed data from my XML file, - i still haven't nothing working yet. Still not giving up, and probably will ask many questions here... ;) – cajwine May 17 '16 at 11:51
So, once when you mastering the module, will learn the XPath & ELT and many-many-many related things - it could be an the fast and recommended way parsing and fetching data from XML, but if you're an stupid perl and XML beginner (like me) and want FAST results - the `XML::Twig` probably isn't your way. – cajwine May 17 '16 at 11:52
Comments aren't really the place for this discussion. But you are wrong. No matter how much **bold text** you use. `XML::Twig` is a _correct_ solution to the problem (others exist). `XML::Simple` and `regex` are not. They are hacks, that will create brittle code. Ask a new question with your _actual code_ and I'll show you the `XML::Twig` solution, and I'd be prepared to bet it's shorter and neater. – Sobrique May 17 '16 at 12:10
I'm absolutely sure that youre right. I already said: _The module is sure great and probably allows solve any XML-related problem,_ and _it could be an the fast and recommended way parsing and fetching data from XML_. I'm not questioning your knowledge :) - just my knowledge isn't enough :) For an beginner - it is hard. (Know, because me **is** an beginner - regardless of the bold. :) ). Sorry, i don't want "upset" you. :) – cajwine May 17 '16 at 12:22
I'm quite passionate about _not_ using `XML::Simple` or `regex` because I have seen far too many occasions where it's processed badly. It's a data transfer language with a formal specification - the point being, to allow both parties to transfer data in an implementation independent fashion. So when - one day - script that "parses" XML badly breaks, because it can't handle 'valid' XML changes, who's fault is it? Who gets to fix it? – Sobrique May 17 '16 at 12:27
I don't understand why you mentioning regexes - probably nobody want process XML using regexes, it sounds to me as an stupidity... Youre right. Agree. And sorry again. :) – cajwine May 17 '16 at 12:31
It comes up more than you think, and has the same fundamental problem as `XML::Simple`. It doesn't work reliably. – Sobrique May 17 '16 at 12:41
Just for the correctness, you mean `for my $child...` instead of `for my $node...`. (or `print $node->tag`)... – cajwine May 18 '16 at 12:30
1

So, at the "end of day" - i must say: "Thank you, for convince me to using the `XML::Twig`". It has an steep learning curve - but the result is better. So, thanx again. Added an EDIT to the question itself. :) ;) – cajwine May 18 '16 at 17:23

Force perl XMLin to treat an empty tag as empty string?

EDIT

3 Answers3