0

I'm using Perl and XML::LibXML, and the XML I have to deal with looks like this:

<PARAM NAME = "A"><VALUE>1</VALUE>
<PARAM NAME = "B"><VALUE>3</VALUE>
<PARAM NAME = "C"><VALUE>43</VALUE>
<PARAM NAME = "A"><VALUE>6</VALUE>
<PARAM NAME = "B"><VALUE>3</VALUE>
<PARAM NAME = "C"><VALUE>13</VALUE>
.
.
.

The output I need is basically:

A    B    C
1    3    43
6    3    13

I've put the literal node names into an array like this:

my @attributes = (
    './PARAM[@NAME = "A"]/VALUE',
    './PARAM[@NAME = "B"]/VALUE',
    .
    .
);

and then have used findnodes() and findvalue() with these xpath literals as the arguments in a foreach loop in a mistaken attempt to get a 'set' of values out, to write to a record. Naturally, findnodes() is wrong because it gets all nodes that meet the criteria in each pass through the loop (as it's supposed to do), and findvalues() is wrong because it in effect does the same thing, just concatenating all the like-named node values.

Since this file is structured the way it is, I see no way to do capture the 'A thru C' nodes/values, write a record, then repeat...at least not without checking every node to see if it's the 'last one' ('C'). Seems I need to process this as a plain old text file, basically.

Kirk Fleming
  • 497
  • 5
  • 15

3 Answers3

0

You did not provide what language you're using, but it seems to be perl. Basically, fetch all <VALUE/> elements (respectively their text nodes) and then loop over them, each time reading three values.

In a somewhat perlish kind of pseudocode:

@attributes = xpath('//PARAM/VALUE');
for ($i = 0; i < length(@attributes); i += 3)
  push @records (@attributes[$i], @attributes[$i + 1], @attributes[$i + 2])

As result, you should get an array of arrays (you could also return an array of hashes, of course). If you just want the output, use the pattern above with an appropriate call of printf instead of push.

Jens Erat
  • 37,523
  • 16
  • 80
  • 96
  • Thank you. The source file isn't under my control so I need to assume the 'set' of attribute nodes can change--in fact I just found that they do. What I've done is: – Kirk Fleming Apr 23 '14 at 13:51
0

Here's an approach I took that works:

foreach my $parameter ( $raid_group->findnodes('PARAM')) {
    my $name  = $parameter->findvalue('@NAME);
    my $value = $parameter->findvalue('VALUE');
    if ($name eq $first_name_in_set ){
        [do stuff]
    }
}

This is a case of using a screwdriver for a chisel I think--expedient but not much more.

Kirk Fleming
  • 497
  • 5
  • 15
0

Your data isn't actually valid XML as there's no closing tag for each PARAM. Therefore you'll either need to clean up the data before running through a XML Parser, or use a regular expression.

The following uses a regex to parse any number of fields and values:

use strict;
use warnings;

my %seen_header;
my @headers;
my @data = {};

while (<DATA>) {
    if (m{<PARAM NAME = "(.*?)"><VALUE>(.*?)</VALUE>}i) {
        my $name = $1;
        my $val = $2;

        push @headers, $name if ! $seen_header{$name}++;
        push @data, {} if exists $data[-1]{$name};
        $data[-1]{$name} = $val;

    } else {
        warn "Unrecognized format at line $.: $_"
    }
}

print "@headers\n";
print join(' ', map {$_ // ''} @{$_}{@headers}), "\n" for (@data);

__DATA__
<PARAM NAME = "A"><VALUE>1</VALUE>
<PARAM NAME = "B"><VALUE>3</VALUE>
<PARAM NAME = "C"><VALUE>43</VALUE>
<PARAM NAME = "A"><VALUE>6</VALUE>
<PARAM NAME = "B"><VALUE>3</VALUE>
<PARAM NAME = "C"><VALUE>13</VALUE>

Outputs:

A B C
1 3 43
6 3 13

Could also adapt this code for using an XML Parser, but I'll leave that up to you if it's what you want.

Miller
  • 34,962
  • 4
  • 39
  • 60