3

I have multiple types of xml messages I need to "compact" by grouping multiple nodes under the same parent (same parent meaning they share the same node name and every attribute declared is also equal). For example:

<TopLevel CodeTL="Something">
    <Ratings>
          <Rating CodeA="ABC" Start="1-1-2012" End="1-2-2012">
              <RatingByNumber Code="X" Rating="10" Number="1">
              <RatingByNumber Code="X" Rating="19" Number="2">
          </Rating>
    </Ratings>
</TopLevel>
    <TopLevel CodeTL="Something">
    <Ratings>
          <Rating CodeA="ABC" Start="1-2-2012" End="1-3-2012">
              <RatingByNumber Code="X" Rating="10" Number="1">
              <RatingByNumber Code="X" Rating="19" Number="2">
          </Rating>
    </Ratings>
</TopLevel>
<TopLevel CodeTL="Something">
    <Ratings>
          <Rating CodeA="XYZ" Start="1-2-2012" End="1-3-2012">
              <RatingByNumber Code="X" Rating="10" Number="1">
              <RatingByNumber Code="X" Rating="19" Number="2">
          </Rating>
    </Ratings>
</TopLevel>
<TopLevel CodeTL="Something">
    <Ratings>
          <Rating CodeA="XYZ" Start="1-2-2012" End="1-3-2012">
              <RatingByNumber Code="X" Rating="30" Number="3">
              <RatingByNumber Code="X" Rating="39" Number="4">
          </Rating>
    </Ratings>
</TopLevel>

Notice how they all share the same CodeTL attribute and the last 2 share the same CodeA,Start and End attributes so what I need is to produce the following output using a xslt

<TopLevel CodeTL="Something">
    <Ratings>
          <Rating CodeA="ABC" Start="1-1-2012" End="1-2-2012">
              <RatingByNumber Code="X" Rating="10" Number="1">
              <RatingByNumber Code="X" Rating="19" Number="2">
          </Rating>
          <Rating CodeA="ABC" Start="1-2-2012" End="1-3-2012">
              <RatingByNumber Code="X" Rating="10" Number="1">
              <RatingByNumber Code="X" Rating="19" Number="2">
          </Rating>
          <Rating CodeA="XYZ" Start="1-2-2012" End="1-3-2012">
              <RatingByNumber Code="X" Rating="10" Number="1">
              <RatingByNumber Code="X" Rating="19" Number="2">
              <RatingByNumber Code="X" Rating="30" Number="3">
              <RatingByNumber Code="X" Rating="39" Number="4">
          </Rating>
    </Ratings>
</TopLevel>

which is much cleaner and, depending on the application consuming it, it might save processing time and saves space.

The problem I'm having is that I have different types of xml messages with different node names and attributes (and number of attributes) but they all share the same structure I'm showing here. It would be great a generic way to handle all of them but I would be grateful for a XSLT to transform the example I provided so I can create custom code for every xml message I need to send out.

Ed Fox
  • 173
  • 1
  • 9

2 Answers2

1

This XSLT 1.0 stylesheet produces the desired result:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:key name="byCodeTL" match="TopLevel" use="@CodeTL"/>
    <xsl:key name="byAttrs" match="Rating" 
             use="concat(../../@CodeTL, '|', @CodeA, '|', @Start, '|', @End)"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="TopLevel[generate-id()=
                                  generate-id(key('byCodeTL', @CodeTL)[1])]">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <Ratings>
                <xsl:apply-templates 
                        select="key('byCodeTL', @CodeTL)/Ratings/*"/>
            </Ratings>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="Rating[generate-id()=
                                generate-id(key('byAttrs', 
            concat(../../@CodeTL, '|', @CodeA, '|', @Start, '|', @End))[1])]">
        <xsl:copy>
            <xsl:apply-templates select="@*|key('byAttrs', 
                concat(../../@CodeTL, '|', @CodeA, '|', @Start, '|', @End))/*"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="TopLevel"/>
    <xsl:template match="Rating"/>
</xsl:stylesheet>

All TopLevel elements are grouped by their CodeTL attribute. All Rating elements are grouped by a combination of their attributes and the CodeTL attribute of their corresponding TopLevel.

Wayne
  • 59,728
  • 15
  • 131
  • 126
  • that seems to work but fails when there are 2 TopLevel nodes with different code but have the same children (they get grouped under the first node that appears in the file). for example http://pastebin.com/0EPpnycL – Ed Fox Jul 10 '12 at 04:00
  • @EdFox - Good point. We should include the grandparent `@CodeTL` in the `Rating` group key. See my edit. – Wayne Jul 10 '12 at 06:35
1

This generic XSLT 2.0 transformation:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:my="my:my" exclude-result-prefixes="xs my">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/*">
     <t>
       <xsl:sequence select="my:grouping(*)"/>
     </t>
 </xsl:template>

 <xsl:function name="my:grouping" as="node()*">
   <xsl:param name="pElems" as="element()*"/>

   <xsl:if test="$pElems">
       <xsl:for-each-group select="$pElems" group-by="my:signature(.)">
         <xsl:copy>
          <xsl:copy-of select="@*"/>

            <xsl:sequence select="my:grouping(current-group()/*)"/>
         </xsl:copy>
       </xsl:for-each-group>
   </xsl:if>
 </xsl:function>

 <xsl:function name="my:signature" as="xs:string">
  <xsl:param name="pElem" as="element()"/>

  <xsl:variable name="vsignAttribs" as="xs:string*">
      <xsl:for-each select="$pElem/@*">
       <xsl:sort select="name()"/>

       <xsl:value-of select="concat(name(), '=', .,'|')"/>
      </xsl:for-each>
  </xsl:variable>

  <xsl:sequence select=
  "concat(name($pElem), '|', string-join($vsignAttribs, ''))"/>
 </xsl:function>
</xsl:stylesheet>

when applied on the provided XML (wrapped into a single top element to become well-formed XML document):

<t>
    <TopLevel CodeTL="Something">
        <Ratings>
              <Rating CodeA="ABC" Start="1-1-2012" End="1-2-2012">
                  <RatingByNumber Code="X" Rating="10" Number="1"/>
                  <RatingByNumber Code="X" Rating="19" Number="2"/>
              </Rating>
        </Ratings>
    </TopLevel>
        <TopLevel CodeTL="Something">
        <Ratings>
              <Rating CodeA="ABC" Start="1-2-2012" End="1-3-2012">
                  <RatingByNumber Code="X" Rating="10" Number="1"/>
                  <RatingByNumber Code="X" Rating="19" Number="2"/>
              </Rating>
        </Ratings>
    </TopLevel>
    <TopLevel CodeTL="Something">
        <Ratings>
              <Rating CodeA="XYZ" Start="1-2-2012" End="1-3-2012">
                  <RatingByNumber Code="X" Rating="10" Number="1"/>
                  <RatingByNumber Code="X" Rating="19" Number="2"/>
              </Rating>
        </Ratings>
    </TopLevel>
    <TopLevel CodeTL="Something">
        <Ratings>
              <Rating CodeA="XYZ" Start="1-2-2012" End="1-3-2012">
                  <RatingByNumber Code="X" Rating="30" Number="3"/>
                  <RatingByNumber Code="X" Rating="39" Number="4"/>
              </Rating>
        </Ratings>
    </TopLevel>
</t>

produces the wanted, correct result:

<t>
   <TopLevel CodeTL="Something">
      <Ratings>
         <Rating CodeA="ABC" Start="1-1-2012" End="1-2-2012">
            <RatingByNumber Code="X" Rating="10" Number="1"/>
            <RatingByNumber Code="X" Rating="19" Number="2"/>
         </Rating>
         <Rating CodeA="ABC" Start="1-2-2012" End="1-3-2012">
            <RatingByNumber Code="X" Rating="10" Number="1"/>
            <RatingByNumber Code="X" Rating="19" Number="2"/>
         </Rating>
         <Rating CodeA="XYZ" Start="1-2-2012" End="1-3-2012">
            <RatingByNumber Code="X" Rating="10" Number="1"/>
            <RatingByNumber Code="X" Rating="19" Number="2"/>
            <RatingByNumber Code="X" Rating="30" Number="3"/>
            <RatingByNumber Code="X" Rating="39" Number="4"/>
         </Rating>
      </Ratings>
   </TopLevel>
</t>

Explanation:

  1. The performed grouping is implemented in the function my:grouping() and is recursive.

  2. The top element is single at its level and doesn't need any other grouping than just shallow copy of itself. Then inside the body of this shallow copy the grouping of the lower levels is performed by the function my:grouping().

  3. The function my:grouping() has a single argument which is all the children elements of a all elements in a group at the immediate upper level. It returns all groups at the current level.

  4. The sequence of elements passed as argument to the function, is grouped based on their signature -- the concatenation of the name of the element with all name-value pairs of its attributes and their corresponding values, and these are separated using appropriate delimiters. The signature of an element is produced by the function my:signature() .


II. Generic XSLT 1.0 solution:

<xsl:stylesheet version="1.0"
         xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
         xmlns:ext="http://exslt.org/common"
         xmlns:my="my:my" exclude-result-prefixes="my ext">
         <xsl:output omit-xml-declaration="yes" indent="yes"/>
         <xsl:strip-space elements="*"/>

         <xsl:variable name="vrtfPass1">
          <xsl:apply-templates select="/*"/>
         </xsl:variable>

         <xsl:variable name="vPass1" select="ext:node-set($vrtfPass1)"/>

         <xsl:template match="/">
          <xsl:apply-templates select="$vPass1/*" mode="pass2"/>
         </xsl:template>

         <xsl:template match="/*" mode="pass2">
             <xsl:copy>
               <xsl:call-template name="my:grouping">
                <xsl:with-param name="pElems" select="*"/>
               </xsl:call-template>
             </xsl:copy>
         </xsl:template>

         <xsl:template name="my:grouping">
           <xsl:param name="pElems" select="/.."/>

           <xsl:if test="$pElems">
             <xsl:for-each select="$pElems">
              <xsl:variable name="vPos" select="position()"/>

              <xsl:if test=
               "not(current()/@my:sign
                   = $pElems[not(position() >= $vPos)]/@my:sign
                   )">

                 <xsl:element name="{name()}">
                  <xsl:copy-of select="namespace::*[not(. = 'my:my')]"/>
                  <xsl:copy-of select="@*[not(name()='my:sign')]"/>
                   <xsl:call-template name="my:grouping">
                    <xsl:with-param name="pElems" select=
                    "$pElems[@my:sign = current()/@my:sign]/*"/>
                   </xsl:call-template>
                 </xsl:element>
               </xsl:if>

             </xsl:for-each>
           </xsl:if>
         </xsl:template>

     <xsl:template match="/*">
             <xsl:copy>
               <xsl:apply-templates/>
             </xsl:copy>
     </xsl:template>

     <xsl:template match="*/*">
      <xsl:variable name="vSignature">
       <xsl:call-template name="signature"/>
      </xsl:variable>
      <xsl:copy>
       <xsl:copy-of select="@*"/>
       <xsl:attribute name="my:sign">
        <xsl:value-of select="$vSignature"/>
       </xsl:attribute>

       <xsl:apply-templates/>
      </xsl:copy>
     </xsl:template>

     <xsl:template name="signature">
       <xsl:variable name="vsignAttribs">
         <xsl:for-each select="@*">
          <xsl:sort select="name()"/>

                <xsl:value-of select="concat(name(), '=', .,'|')"/>
             </xsl:for-each>
        </xsl:variable>

        <xsl:value-of select=
          "concat(name(), '|', $vsignAttribs)"/>
     </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the same XML document (above), again the same correct result is produced:

<t>
   <TopLevel>
      <Ratings>
         <Rating CodeA="ABC" Start="1-1-2012" End="1-2-2012">
            <RatingByNumber Code="X" Rating="10" Number="1"/>
            <RatingByNumber Code="X" Rating="19" Number="2"/>
         </Rating>
         <Rating CodeA="ABC" Start="1-2-2012" End="1-3-2012">
            <RatingByNumber Code="X" Rating="10" Number="1"/>
            <RatingByNumber Code="X" Rating="19" Number="2"/>
         </Rating>
         <Rating CodeA="XYZ" Start="1-2-2012" End="1-3-2012">
            <RatingByNumber Code="X" Rating="10" Number="1"/>
            <RatingByNumber Code="X" Rating="19" Number="2"/>
            <RatingByNumber Code="X" Rating="30" Number="3"/>
            <RatingByNumber Code="X" Rating="39" Number="4"/>
         </Rating>
      </Ratings>
   </TopLevel>
</t>

Explanation:

  1. This is a two-pass transformation.

  2. In the first pass for every element a signature is calculated and it becomes the valye of a new attribute my:sign.

  3. The same recursive grouping algorithm is used as with the XSLT 2.0 solution.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • That seems really what I wanted except I'm stucked with 1.0. I'll see if I can do something about it. Thank you for the detailed answer. – Ed Fox Jul 10 '12 at 12:24
  • 1
    @EdFox: In XSLT 1.0 the same idea is used, but with a two-pass transformation that in the first pass creates a copy of each element and adds a special new element (or attribute) that contains the signature. In the second pass we do simple Muenchian grouping on this special element/attribute. – Dimitre Novatchev Jul 10 '12 at 12:28
  • could you add the 1.0 version to your post? I'm sorry I'm still a bit overwhelmed by xslt so I not confident that I can translate your explanation into actual code. – Ed Fox Jul 10 '12 at 13:11
  • @EdFox: With pleasure. I will be going to work shortly, so I will be able to start at a similar XSLT 1.0 solution in 10 hours from now -- so, please, be patient. – Dimitre Novatchev Jul 10 '12 at 13:20
  • @EdFox: Done -- see Part II of this answer for the equivalent generic XSLT 1.0 transformation. – Dimitre Novatchev Jul 11 '12 at 04:07
  • why is the xmlns declaration in the output even though you excluded it in the header? – Ed Fox Jul 11 '12 at 13:55
  • @EdFox: Yes, I saw this -- this will disappear if the template name and the special attribute name (`my:sign`) are renamed to non-prefixed. I used the namespace for the attribute to ensure that this name cannot clash with an existing attribute name in the document -- we could use some very special name in no namespace instead -- something like: `"_________A_Very_Special_Attribute_________"` – Dimitre Novatchev Jul 11 '12 at 14:28
  • @EdFox: The unwanted namespace problem is fixed and I have replaced the XSLT 1.0 transformation with the fixed one -- just a minor change -- replaced `` with `` and added selective copying of the namespace nodes, excluding the `"my:my"` namespace. Therefore, please, ignore my previous comment recommending as solution not to use the special namespace. – Dimitre Novatchev Jul 12 '12 at 03:54
  • if some nodes contain text, to copy that I should add `` after `` (in the 1.0 version), correct? – Ed Fox Jul 18 '12 at 13:16
  • @EdFox: Yes, I would use `` as this is more generic and copies the whole subtree rooted in the current node. – Dimitre Novatchev Jul 18 '12 at 13:21