2

Background

I'm designing a Perl application that uses XML files as inputs for config and settings information. There will be a hierarchy of documents, with global data overridden by more local information.

My program will be invoked with the most local settings file which will contain paths to more general files. Some local settings will be absolute, and which these are would be hard coded in the program.

The initialization task is to get the settings for an invocation from the highest level, reading them in and then going on to each level and merge/join them as a single XML document.

Sample Data

Global_layouts_100.xml

<CONFIG>
    <GRP1>
        <FIELD foo="abs" format="%.4f">QTY</FIELD>
        <FIELD default="" format="%.2f">COST</FIELD>
        <FIELD default="0" format="%.2f">AMT</FIELD>
        <FIELD default="1960-01-01" format="YYYMMDD">TRANDATE</FIELD>
        <FIELD>ACCOUNT</FIELD>
        <FIELD default="0">ACCT_TYPE</FIELD>
    </GRP1>
    <GRP2>
        <FIELD> 1 </FIELD>
        <FIELD> 2 </FIELD>
        <FIELD> 3 </FIELD>
    </GRP2>
</CONFIG>

Global_properties_100.xml

<CONFIG>
    <CUS>
        <GRP>GRP1</GRP>
        <HDR>CUSTOMER</HDR>
        <TLR>TLR${cnt}</TLR>
    </CUS>
    <XYZ>
        <GRP>GRP2</GRP>
        <HDR>ACCOUNTS</HDR>
        <TLR>TLR${cnt}</TLR>
    </XYZ>
</CONFIG>

Global_70.xml

<CONFIG>
<PARENT_SETTINGS>Global_layouts_100</PARENT_SETTINGS>
<PARENT_SETTINGS>Global_properties_100</PARENT_SETTINGS>
    <LOOKUPS>
        <MAP type="file">
            <NAME>ACCT_TYPE_LOOKUP</NAME>
            <PATH>${PATH}acct_type.csv</PATH>
            <HEADERS>
                <COLUMN>ACCT_TYPE</COLUMN>
                <COLUMN>SOURCE_VALUE</COLUMN>
            </HEADERS>
            <KEYS>
                <COLUMN>SOURCE_VALUE</COLUMN>
            </KEYS>
        </MAP>
    </LOOKUPS>
</CONFIG>

local.xml

<CONFIG>
    <PARENT_SETTINGS>Global_70</PARENT_SETTINGS>
    <BATCH>
        <CUS>
            <SRCFILE type="csv" delimiter="|">/path/to/src_file</SRCFILE>
            <OUTFILE>/path/to/out_file</OUTFILE>
            <FIELDS>
                <CUSTOMER>&CUSTOMER;</CUSTOMER>
                <QTY default="0.0" col="23"></QTY>
                <COST format="%.4f" col="21"></COST>
                <FEE col="18"></FEE>
            </FIELDS>
        </CUS>
        <XYZ>
            <SRCFILE />
            <OUTFILE />
            <FIELDS>
                <FIELD_1 />
                <FIELD_2 />
                <FIELD_3 />
                <FIELD_4 />
                <FIELD_5 />
            </FIELDS>
        </XYZ>
    </BATCH>
</CONFIG>

Now, if the program would be given the local.xml to start and CUS as an arg to process I'd like to see this XML (or equivalant perl data structure):

<CONFIG>
    <HDR>CUSTOMER</HDR>
    <TLR>TLR${cnt}</TLR>
    <SRCFILE type="csv" delimiter="|">/path/to/src_file</SRCFILE>
    <OUTFILE>/path/to/out_file</OUTFILE>
    <LOOKUPS>
        <MAP type="file">
            <NAME>ACCT_TYPE_LOOKUP</NAME>
            <PATH>${PATH}acct_type.csv</PATH>
            <HEADERS>
                <COLUMN>ACCT_TYPE</COLUMN>
                <COLUMN>SOURCE_VALUE</COLUMN>
            </HEADERS>
            <KEYS>
                <COLUMN>SOURCE_VALUE</COLUMN>
            </KEYS>
        </MAP>
    </LOOKUPS>
    <CUS>
        <FIELD foo="abs" format="%.4f" default="0.0" col="23">QTY</FIELD>
        <FIELD default="" format="%.4f" col="21">COST</FIELD>
        <FIELD default="0" format="%.2f">AMT</FIELD>
        <FIELD default="1960-01-01" format="YYYMMDD">TRANDATE</FIELD>
        <FIELD>ACCOUNT</FIELD>
        <FIELD default="0">ACCT_TYPE</FIELD>
        <FIELDS>
            <CUSTOMER>&CUSTOMER;</CUSTOMER>
            <QTY default="0.0" col="23"></QTY>
            <COST format="%.4f" col="21"></COST>
            <FEE col="18"></FEE>
        </FIELDS>
    </CUS>
</CONFIG>

And, if the program would be given the local.xml to start and XYZ as an arg to process I'd like to see this XML (or equivalant perl data structure):

<CONFIG>
    <HDR>ACCOUNTS</HDR>
    <TLR>TLR${cnt}</TLR>
    <SRCFILE />
    <OUTFILE />
    <LOOKUPS>
        <MAP type="file">
            <NAME>ACCT_TYPE_LOOKUP</NAME>
            <PATH>${PATH}acct_type.csv</PATH>
            <HEADERS>
                <COLUMN>ACCT_TYPE</COLUMN>
                <COLUMN>SOURCE_VALUE</COLUMN>
            </HEADERS>
            <KEYS>
                <COLUMN>SOURCE_VALUE</COLUMN>
            </KEYS>
        </MAP>
    </LOOKUPS>
    <XYZ>
        <FIELD> 1 </FIELD>
        <FIELD> 2 </FIELD>
        <FIELD> 3 </FIELD>
        <FIELDS>
            <FIELD_1 />
            <FIELD_2 />
            <FIELD_3 />
            <FIELD_4 />
            <FIELD_5 />
        </FIELDS>
    </XYZ>
</CONFIG>

Question

What is the most efficient way of merging these XML documents?

I can do it myself with the data structures returned by XML::Simple, or maybe there are some other XML tools I should use?

I hope my question is clear enough and does not need sample XML data. If you need to see something then I can post some sample stuff.

The question in brief is, what is the best way to merge a hierarchy of individual XML documents?

lzc
  • 919
  • 7
  • 16
  • 1
    *Never ever* use `XML::Simple`. There is no need to combine the documents into a single XML document. I suggest that you build a simple Perl data structure that contains the values that you need. By processing the data files from ther most general to the most specific, the more relevant values will be overwritten – Borodin Sep 06 '15 at 23:11
  • We can give you a better example with some sample data. – Sobrique Sep 07 '15 at 09:00
  • Yeah, I keep seeing everybody maligning `XML::Simple`, but it my case, my XML docs will not hold data, just config and preference and/or meta-data type stuff. And "Simple" seems simple to me. What would be the alternative ? – lzc Sep 07 '15 at 14:19
  • 1
    As the author of XML::Simple, I'd definitely agree that you should not use it. I personally [use and recommend XML::LibXML](http://www.perlmonks.org/index.pl?node_id=490846) but XML::Twig is another popular choice. – Grant McLean Sep 08 '15 at 02:27
  • @GrantMcLean **thank you!** I don't need more than the author's advice! And that article is excellent, I'll be selling to my higher-ups to make the switch, while I myself will need to learn the XPath expression language. If you could suggest advice/direction on the original question, I'd greatly appreciate it. Cheers – lzc Sep 08 '15 at 14:22

2 Answers2

2

Preface: I know nothing of Perl but one option is to use XSLT, a declarative special-purpose language to style/transform XML documents.

And I do know, most languages like PHP (somewhat a Perl descendent), Python, Java, C#, etc. maintain XML libraries and likewise XSLT transformation. So, consider applying a Perl XSLT processor where you use XSLT file to merge documents (which you can specify particular nodes)

Using your sample data, below stylesheets would render your final XML structures for CUS and XYZ. Be sure to keep all derivative XMLs in same directory.

CUS VERSION

<?xml version="1.0" ?> 
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> 

 <xsl:template match="CONFIG">

    <xsl:copy> 
         <xsl:copy-of select="document('Global_properties_100.xml')/CONFIG/CUS/HDR" />
         <xsl:copy-of select="document('Global_properties_100.xml')/CONFIG/CUS/TLR" />
         <xsl:copy-of select="BATCH/CUS/SRCFILE" />
         <xsl:copy-of select="BATCH/CUS/OUTFILE" />
         <xsl:copy-of select="document('Global_70.xml')/CONFIG/LOOKUPS" />
         <CUS>
            <xsl:copy-of select="document('Global_layouts_100.xml')/CONFIG/GRP1/*" />
            <xsl:copy-of select="BATCH/CUS/FIELDS" />
         </CUS>
    </xsl:copy>

 </xsl:template> 

</xsl:transform>

XYZ VERSION

<?xml version="1.0" ?> 
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> 

 <xsl:template match="CONFIG">

    <xsl:copy> 
         <xsl:copy-of select="document('Global_properties_100.xml')/CONFIG/XYZ/HDR" />
         <xsl:copy-of select="document('Global_properties_100.xml')/CONFIG/XYZ/TLR" />
         <xsl:copy-of select="BATCH/XYZ/SRCFILE" />
         <xsl:copy-of select="BATCH/XYZ/OUTFILE" />
         <xsl:copy-of select="document('Global_70.xml')/CONFIG/LOOKUPS" />
         <CUS>
            <xsl:copy-of select="document('Global_layouts_100.xml')/CONFIG/GRP2/*" />
            <xsl:copy-of select="BATCH/XYZ/FIELDS" />
         </CUS>
    </xsl:copy>

 </xsl:template> 

</xsl:transform>
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • I don't think this helps at all. It would create a document that contains *all* of the data from all documents, and it assumes that the tag names are different across documents. I think you've just encountered XSLT and are excited to use it in every possible situation – Borodin Sep 06 '15 at 23:20
  • @Borodin As shown, you can select particular nodes from external documents. You can even create templates from external docs, choosing which tags of same name. And yes, I am excited about XSLT which I often find to be a dismissed, forgotten language while many programmers use general purpose, bulky coding to restructure XMLs which can natively be done in a stylesheet. If only I could show you! I'll await the OP. – Parfait Sep 06 '15 at 23:36
  • XSLT is ignored mostly because it is a badly-designed declarative language that is hard to learn and understand. I think the OP's problem is that he has something like **global.xml** containing `xxx` and **local.xml** which contains `yyy`. What he thinks he wants, and what you're trying to provide with your new toy, is another XML document that looks like the local file, as it has overridden the global file. What you're actually providing is `xxxyyy` – Borodin Sep 06 '15 at 23:57
  • . . . and what I think he wants is a simple Perl hash `my %config = ( value => 'yyy' )` – Borodin Sep 06 '15 at 23:58
  • @Borodin you win:) I feel I ignited some hard feelings about XSLT and its true usefullness. I just wonder if not XLST, then what is the alternative ? To me, as I'm working doing this merging on my own with Perl... it occured to me that I'd rather "stack out my issue" (pun-intended) and see if there are better smarties out there who've been there; done that ... – lzc Sep 07 '15 at 18:46
  • XSLT has a niche, but there's a reason why scripting languages (like `perl`) are popular - because there's a lot of libraries like `XML::Twig` and `XML::LibXML` that let you do all the XML manipulation you might like. – Sobrique Sep 08 '15 at 13:14
  • @Borodin "XSLT is ignored mostly because it is a badly-designed declarative language that is hard to learn and understand." This may be your personal opinion but most people would disagree. – nwellnhof Sep 08 '15 at 23:55
  • @Sobrique - Indeed, XSLT does serve a special purpose need, certainly not an end-all, be-all language much like SQL, also a declarative language. It does take some getting used to being a recursive template language and not object-oriented one. For OP, consider my edited answer using your sample data that I ran through with PHP and Python. At the very least it can help future readers. – Parfait Sep 09 '15 at 02:16
2

I can give you a more specific example with some sample data, but when approaching this I tend to use XML::Twig.

Specifically - XML::Twig has built in support for cut and paste so you can build a new document tree, and preserve the elements you want, in the order I want.

Something like this:

#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;

my $twig = XML::Twig -> parse ( \*DATA );

my $newdoc = XML::Twig -> new ('pretty_print' => 'indented_a');
$newdoc -> set_root ( XML::Twig::Elt -> new ( 'new_root_here' ) );
$newdoc -> set_xml_version ('1.0');
$newdoc -> set_encoding('utf-8'); 

foreach my $value_elt ( $twig -> findnodes ( '//value' ) ) {
    $value_elt -> cut;
    $value_elt -> paste ( $newdoc -> root );
}


$newdoc -> print;

__DATA__
<root>
   <value>fish</value>
   <dont_copy>this thing</dont_copy>
</root>

(There's another example: How to I combine data from two XML files into the same structure?)

Community
  • 1
  • 1
Sobrique
  • 52,974
  • 7
  • 60
  • 101