1

How can I parse/extract a list of values from an XML file with well nested values?

I have tried XML Simple and I am only able to extract the first value from a list of over 10 values. I am trying to first select seriesName "Temperature" and extract the values under that group.

This is the XML file I am parsing T124.xml (I HAVE MADE THIS A DIGESTED AS ITS A HUGE FILE):

<chart caption="" subcaption="" palette="6" yAxisMinVal="11800" yAxisMaxVal=17800"xmlns="http: ">
<categories>
<category label=""/>
<category label=""/>
<category label=""/>
<category label=""/>
<category label=""/>
<category label="6"/>
<category label=""/>
<category label=""/>
<category label=""/>
<category label=""/>
<category label="12"/>
<category label=""/>
<category label=""/>
<category label="18"/>
<category label=""/>
<category label=""/>
<category label="21"/>
<category label=""/>
</categories>
- <dataset seriesName="Temperature" color="003366">
<Set value=675.0"/>
<Set value=613.0"/>
<Set value=612.0"/>
<Set value=614.0"/>
<Set value=613.0"/>
<Set value=413.0"/>
<Set value=613.0"/>
<Set value=313.0"/>
<Set value=213.0"/>
<Set value=653.0"/>
<Set value=633.0"/>
<Set value=623.0"/>
</dataset>
<dataset seriesName="Precipitation" color="66CC33">
<set value="50.6"/>
</dataset>
</chart>

Here is the Perl code I used:

#!/usr/bin/perl
use strict; 
use XML::Simple 'XMLin';
use Data::Dumper;

my $parse = XMLin('T124.xml',forcearray => ['value']);
#print Dumper($parse);

foreach my $dataset (@{$parse->{dataset}}) {
    if ($dataset->{seriesName} eq 'Temperature') {
        print $dataset->{seriesName} . "\n";
        print $dataset->{set}->[0]->{value} . "\n";
    }
}

I would like to see the following output (shown below), however I am only able to extract the first 675.0 value ONLY:

Temperature
675.0 
613.0
612.0
614.0
613.0 

ETC...

BrianB
  • 133
  • 1
  • 12
  • Don't use [`XML::Simple`](https://metacpan.org/pod/XML::Simple) as it's [*outdated*](https://metacpan.org/pod/XML::Simple#STATUS-OF-THIS-MODULE). Instead use [`XML::LibXML`](https://metacpan.org/pod/XML::LibXML) or [`XML::Twig`](https://metacpan.org/pod/XML::Twig) so you can utilize XPaths to access deep nodes. – Miller Oct 07 '14 at 22:30
  • Is your source missing quotes for the temperature values, or is that a problem in the paste? – Jim Davis Oct 07 '14 at 23:03
  • Hi Jim, The sources XML is at the top. I show the context of the file .... – BrianB Oct 08 '14 at 16:11

1 Answers1

1

Here's a simple script to extract the temperature data from your XML. I've included the XML source here because the source you provided seems to be missing some "s. I've used XML::Twig to parse the data.

#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;

my $xml = '<chart caption="" subcaption="" palette="6" yAxisMinVal="11800" yAxisMaxVal="17800" xmlns="http://">
<categories>
<category label=""/>
<category label=""/>
<category label=""/>
<category label=""/>
<category label=""/>
<category label="6"/>
<category label=""/>
<category label=""/>
<category label=""/>
<category label=""/>
<category label="12"/>
<category label=""/>
<category label=""/>
<category label="18"/>
<category label=""/>
<category label=""/>
<category label="21"/>
<category label=""/>
</categories>
<dataset seriesName="Temperature" color="003366">
<Set value="675.0"/>
<Set value="613.0"/>
<Set value="612.0"/>
<Set value="614.0"/>
<Set value="613.0"/>
<Set value="413.0"/>
<Set value="613.0"/>
<Set value="313.0"/>
<Set value="213.0"/>
<Set value="653.0"/>
<Set value="633.0"/>
<Set value="623.0"/>
</dataset>
<dataset seriesName="Precipitation" color="66CC33">
<set value="50.6"/>
</dataset>
</chart>';

my $t = XML::Twig->new();
$t->parse( $xml );   # or $t->parsefile( $filename ); to read from a file

# this xpath finds all <Set> elements under the <dataset> element
# where attribute "seriesName" = "Temperature"
my @sets = $t->findnodes('//dataset[@seriesName="Temperature"]/Set');

if (@sets) {

    my $outfile = '/path/to/output/file.txt';
    open my $out, ">", $outfile or die "Could not open $outfile: $!";
    print { $out } "Temperature\n";
    print { $out } $_->att('value')."\n" for @sets;
}

Output:

Temperature
675.0
613.0
612.0
614.0
613.0
413.0
613.0
313.0
213.0
653.0
633.0
623.0
i alarmed alien
  • 9,412
  • 3
  • 27
  • 40
  • Thanks Much "i alarmed alien" I replaced say with print and added "/n" to gain the same results. You have saved me all the pain. That said, I am attempting this script using in and out files. I want to push the extracted XML data into a CSV file but not sure how to do this with "twig". – BrianB Oct 08 '14 at 01:30
  • Sorry I wasn't being clear again. The print out file isn't the issue (though I will use what you've provided) its the in file (XML) parsefile syntax Im lost at. $t->parsefile( "T123.xml") I though would be correct but... – BrianB Oct 08 '14 at 16:09
  • What happens when you do `$t->parsefile(...)`? The XML you posted was invalid, so I had to correct it before I could do any parsing. You may well have to do the same thing. NB: from the docs for XML::Twig: *"A die call is thrown if a parse error occurs."* – i alarmed alien Oct 08 '14 at 16:17
  • HI i alarmed alien, I pasted the full XML into the script, had to reduce the content of the first line under and it works fine. However when I attempt parsefile I get the following error: "you seem to have used the parse method on a filename (*XML::Parser::FILE)" – BrianB Oct 08 '14 at 18:29
  • That warning only comes up when using the `parse` subroutine according to the source code. What code are you using? – i alarmed alien Oct 08 '14 at 18:37
  • my $t = XML::Twig->new( twig_handlers => { 'section/title' => sub { $_->print } } ) ->parsefile( 'T123.xml'); WORKED THANKS i alarmed alien FOR YOU HELP – BrianB Oct 08 '14 at 19:26
  • I AM SO UPSET NOW. I DIDNT WORK I RAN THE WRONG SCRIPT. I ran the one with the imbedded XML content. The parse file still DOES NOT work. AAAHH! Looking at the code I submitted - why would that work... – BrianB Oct 08 '14 at 19:48
  • 1
    Thanks 'i alarmed alien'. You were right, the XML file was the problem so I will adapt the script to clean the XML file then I can parse it - UNBELIEVABLE. Thanks again FYI I am using the parsefile() – BrianB Oct 08 '14 at 23:02