4

I have a long list of values in XML with named identifiers. I need to make separate output files for each of the distinct identifiers grouped together and uniquely named.

So, for example, let's say I have:

<List>
   <Item group="::this_long_and_complicated_group_name_that_cannot_be_a_filename::">
      Hello World!
   </Item>
   <Item group="::this_other_long_and_complicated_group_name_that_cannot_be_a_filename::">
      Goodbye World!
   </Item>
   <Item group="::this_long_and_complicated_group_name_that_cannot_be_a_filename::">
      This example text should be in the first file
   </Item>
   <Item group="::this_other_long_and_complicated_group_name_that_cannot_be_a_filename::">
      This example text should be in the second file
   </Item>
   <Item group="::this_long_and_complicated_group_name_that_cannot_be_a_filename::">
      Hello World!
   </Item>
</List>

How can I write a transformation (XSLT 2.0) to output these grouped into generated filenames and uniquely valued? For example: mapping the first @group to file1.xml and the second @group to file2.xml

Jon W
  • 15,480
  • 6
  • 37
  • 47

1 Answers1

3

Here is a solution that uses some of the good new features in XSLT 2.0:

This transformation:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
      <!--                                                  --> 
    <xsl:template match="/*">
      <xsl:variable name="vTop" select="."/>
      <!--                                                  --> 
        <xsl:for-each-group select="Item" group-by="@group">
          <xsl:result-document href="file:///C:/Temp/file{position()}.xml">
            <xsl:element name="{name($vTop)}">
              <xsl:copy-of select="current-group()"/>
            </xsl:element>
          </xsl:result-document>
        </xsl:for-each-group>
    </xsl:template>
</xsl:stylesheet>

when applied on the OP-provided Xml document (corrected to be well-formed!):

<List>
    <Item group="::this_long_and_complicated_group_name_that_cannot_be_a_filename::">
         Hello World!
    </Item>
    <Item group="::this_other_long_and_complicated_group_name_that_cannot_be_a_filename::">
          Goodbye World!
  </Item>
    <Item group="::this_long_and_complicated_group_name_that_cannot_be_a_filename::">
          This example text should be in the first file
 </Item>
    <Item group="::this_other_long_and_complicated_group_name_that_cannot_be_a_filename::">
          This example text should be in the second file
 </Item>
    <Item group="::this_long_and_complicated_group_name_that_cannot_be_a_filename::">
          Hello World!
  </Item>
</List>

produces the wanted two files: file1.xml and file2.xml

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • This is a great solution, Dimitre. I always see great work from you. However we're looking to place the grouped text together in two files. Is there some extra syntax we can add on to make this happen? – Jon W Mar 12 '09 at 19:22
  • @Jweede You haven't specified the criteria upon which to perform the grouping: Which nodes should go to which of the two files -- according to what rule? – Dimitre Novatchev Mar 12 '09 at 19:35
  • @Dimitre: I had a hard time articulating this problem, I'm sorry if it wasn't clear. There are two distinct group attributes, we want to put them together. i.e.: place all of the matching group attributes into a file together. – Jon W Mar 12 '09 at 19:47
  • @Jweede Then just replace: group-by="normalize-space(.) with group-by="normalize-space(@group) I will edit my answer shortly – Dimitre Novatchev Mar 12 '09 at 20:12
  • @Dimitre: +1 -- I was about to post a similar solution, but given the fact that my knowledge of XSLT 2.0 is somewhat limited, and had no tool to test my idea, I rather waited. ;-) Can you recommend a free (and ideally lightweight) XSLT 2.0 processor that runs on Windows? – Tomalak Mar 13 '09 at 09:48
  • @Tomalak The ultimate #1 Xslt 2.0 processor is Saxon(9.x). Its Basic version(not SA - schema aware) is Open Source and I have used Saxon for more than 5 years. The fastest and most optimized. The developer is Dr. Michael Kay himself -- the editor of the W3 XSLT TG. Simply and undisputably the best. – Dimitre Novatchev Mar 13 '09 at 13:17
  • @Tomalak The comment's length limit didn't let me say "thanks" in the last comment :) – Dimitre Novatchev Mar 13 '09 at 13:18
  • @Dimitre: Never mind. Trying Saxon-B 9.1 for .NET right now. It's the obvious choice that I somehow didn't think of. Thanks for the tip. – Tomalak Mar 13 '09 at 14:13
  • @Tomalak Maybe you need to know that Saxon 9.x for Java is about 3 times faster than Saxon.NET -- this is because Saxon.NET interprets the Java bytecode of Saxon.Java – Dimitre Novatchev Mar 13 '09 at 16:35
  • @Dimitre: This is good info, I wasn't aware of that. But it's somewhat logical that they are not maintaining two different codebases . – Tomalak Mar 13 '09 at 16:49
  • On windows there's a program called Kernow that makes XSLT 2.0 testing really simple. – Jon W Jun 27 '09 at 16:15
  • @jweede I have been using the XSelerator for 8 years and it is still the best XSLT IDE. I have no reason to switch to something else – Dimitre Novatchev Jun 27 '09 at 18:37