2

I have an XML that looks like -

<resultset>
    <hit>
        <content>
            <ITEM>
                <TITLE>Office Cleaning</TITLE>
                <DESCRIPTION>blah blah blah</DESCRIPTION>
                <Hierarchy>level1A:level2A:level3A</Hierarchy>
                <Hierarchy>level1B:level2B:level3B</Hierarchy>
            </ITEM>
        </content>
    </hit>
    <hit>
        <content>
            <ITEM>
                <TITLE>Office Cleaning1</TITLE>
                <DESCRIPTION>blah blah blah</DESCRIPTION>
                <Hierarchy>level1A:level2A:level3A</Hierarchy>
            </ITEM>
        </content>
    </hit>
    <hit>
        <content>
            <ITEM>
                <TITLE>Office Cleaning2</TITLE>
                <DESCRIPTION>blah blah blah</DESCRIPTION>
                <Hierarchy>level1A:level2B:level3C</Hierarchy>
            </ITEM>
        </content>
    </hit>
</resultset>

Note that there are multiple hierarchy elements which is a concatenated string of level1:level2:level3 I am looking to transform this into something like this -

<TREE>
<LEVELS>
<LEVEL1 name="level1A">
 <LEVEL2 name="level2A">
   <LEVEL3 name="level3A">
      <ITEM Name="Office Cleaning"/>
      <ITEM Name="Office Cleaning1"/>
   </LEVEL3>
 </LEVEL2>
</LEVEL1>
<LEVEL1 name="level1B">
 <LEVEL2 name="level2B">
   <LEVEL3 name="level3B">
        <ITEM Name="Office Cleaning"/>
   </LEVEL3>
 </LEVEL2>
</LEVEL1>
<LEVEL1 name="level1A">
 <LEVEL2 name="level2B">
   <LEVEL3 name="level3C">
      <ITEM Name="Office Cleaning2"/>
    </LEVEL3>
 </LEVEL2>
</LEVEL1>
</LEVELS>
</TREE>

Basically each item has multiple hierachy associated with it. I need to group them together.

I got only as far as -

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:autn="http://schemas.autonomy.com/aci/">

<xsl:output method="xml" omit-xml-declaration="yes"/>

<xsl:key name="HIERARCHYLEVELS" match="resultset/hit/content/ITEM" use="HIERARCHY" />
<xsl:template match="/">
<TREE>

    <xsl:for-each select="resultset/hit/content/ITEM[generate-id()=generate-id(key('HIERARCHYLEVELS', HIERARCHY)[1])]">
        <xsl:for-each select="HIERARCHY">
        <xsl:variable name="level" select="HIERARCHY"/>
        <HIERARCHY name="{$level}" >

            <xsl:variable name="name" select="TITLE"/>

            <ITEM name="{$name}"/>

        </HIERARCHY>
        </xsl:for-each>
    </xsl:for-each>


</TREE>
</xsl:template>

</xsl:stylesheet>

But the problem is I only get the first matching hierarchy tag. For e.g. I dont get to see "Office cleaning1". What can I do to make sure all hierarchy elements are considered? I still need to split it into various levels.

  • Your input document is malformed. Please correct. – Sean B. Durkin Oct 23 '12 at 01:36
  • I just edited it Sean. Thanks for highlighting. – user1766784 Oct 23 '12 at 01:54
  • 1
    Your expected output is malformed. Please correct. – Sean B. Durkin Oct 23 '12 at 02:41
  • 1
    Are the element names like `LEVEL2` in the output document, meant to be derived from the co-located `name` attribute (``)? or from the actual positional level of the output node (positional level is implied by this XPath expression: `TREE/LEVEL1/LEVEL2`)? – Sean B. Durkin Oct 23 '12 at 02:47
  • 1
    Your input node `level1B:level2B:level3B` does not map to any output. Is this intentional? Please explain the rule operating here. – Sean B. Durkin Oct 23 '12 at 02:51
  • The element names in the output document is not derived. Its within my scope to call it whatever. Each hierarchy element in the input xml, which is of the form level1:level2:level3 is used to populate the value of the name attribute in the output xml – user1766784 Oct 23 '12 at 02:53
  • 1
    Why are the levels within the Hierarchy text lowercase, but the LEVEL1 element names in uppercase? Is the case important? – Sean B. Durkin Oct 23 '12 at 02:56
  • Nope, case is not important. Dont worry about that. – user1766784 Oct 23 '12 at 02:58
  • As I mentioned, element names in the output can be called anything. Something like top, middle,bottom would be fine too. What is important is the name attribute of the TOP should be the first item while parsing the hierarchy element..and so on. For each hierarchy, its matching ITEM elements should be listed – user1766784 Oct 23 '12 at 03:08
  • The first step of the first `Office Cleaning` `hiearchy` element is level1A. The first step of `Office Cleaning2` is also level1A. So why do `Office Cleaning` and `Office Cleaning2` in the output descend from different `LEVEL1` nodes? This seems to contradict the grouping rule implied by the first `Office Cleaning` and `Office Cleaning1`. ?? – Sean B. Durkin Oct 23 '12 at 04:48
  • Thanks guys. I am yet to try either of the solutions. Will get to them in an hour. – user1766784 Oct 23 '12 at 12:01
  • Actually both the solutions worked for me. I picked the Dimitre one for fewer lines of code. Thanks a ton both of you!!! – user1766784 Oct 23 '12 at 13:22

2 Answers2

0

This transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kItemByHier" match="ITEM" use="Hierarchy"/>
 <xsl:key name="kHierByVal" match="Hierarchy" use="."/>

 <xsl:template match="/*">
     <xsl:apply-templates select=
     "*/*/*/Hierarchy[generate-id()=generate-id(key('kHierByVal',.)[1])]"/>
 </xsl:template>

  <xsl:template match="Hierarchy">
    <xsl:call-template name="makeTree">
      <xsl:with-param name="pHier" select="string()"/>
      <xsl:with-param name="pItems" select="key('kItemByHier', .)"/>
    </xsl:call-template>
  </xsl:template>

  <xsl:template name="makeTree">
    <xsl:param name="pHier"/>
    <xsl:param name="pDepth" select="1"/>
    <xsl:param name="pItems" select="/.."/>

    <xsl:choose>
      <xsl:when test="not($pHier)">
        <xsl:for-each select="$pItems">
          <ITEM name="{TITLE}"/>
        </xsl:for-each>
      </xsl:when>
      <xsl:otherwise>
        <xsl:element name="LEVEL{$pDepth}">
          <xsl:attribute name="name">
            <xsl:value-of select="substring-before(concat($pHier,':'), ':')"/>
          </xsl:attribute>

          <xsl:call-template name="makeTree">
            <xsl:with-param name="pHier" 
                            select="substring-after($pHier,':')"/>
            <xsl:with-param name="pDepth" select="$pDepth+1"/>
            <xsl:with-param name="pItems" select="$pItems"/>
          </xsl:call-template>
        </xsl:element>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<resultset>
    <hit>
        <content>
            <ITEM>
                <TITLE>Office Cleaning</TITLE>
                <DESCRIPTION>blah blah blah</DESCRIPTION>
                <Hierarchy>level1A:level2A:level3A</Hierarchy>
                <Hierarchy>level1B:level2B:level3B</Hierarchy>
            </ITEM>
        </content>
    </hit>
    <hit>
        <content>
            <ITEM>
                <TITLE>Office Cleaning1</TITLE>
                <DESCRIPTION>blah blah blah</DESCRIPTION>
                <Hierarchy>level1A:level2A:level3A</Hierarchy>
            </ITEM>
        </content>
    </hit>
    <hit>
        <content>
            <ITEM>
                <TITLE>Office Cleaning2</TITLE>
                <DESCRIPTION>blah blah blah</DESCRIPTION>
                <Hierarchy>level1A:level2B:level3C</Hierarchy>
            </ITEM>
        </content>
    </hit>
</resultset>

produces the wanted, correct result:

<LEVEL1 name="level1A">
   <LEVEL2 name="level2A">
      <LEVEL3 name="level3A">
         <ITEM name="Office Cleaning"/>
         <ITEM name="Office Cleaning1"/>
      </LEVEL3>
   </LEVEL2>
</LEVEL1>
<LEVEL1 name="level1B">
   <LEVEL2 name="level2B">
      <LEVEL3 name="level3B">
         <ITEM name="Office Cleaning"/>
      </LEVEL3>
   </LEVEL2>
</LEVEL1>
<LEVEL1 name="level1A">
   <LEVEL2 name="level2B">
      <LEVEL3 name="level3C">
         <ITEM name="Office Cleaning2"/>
      </LEVEL3>
   </LEVEL2>
</LEVEL1>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
0

For interest, here is a draft effort at a solution. It is close, but not quiet right, as you can see from the output, as it uses different grouping rules. I am still trying to understand the required grouping rules. I will update if I get a better understanding.

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:exsl="http://exslt.org/common"
  exclude-result-prefixes="xsl exsl">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes" />
<xsl:strip-space elements="*" />

<xsl:variable name="phase-1-output">
  <xsl:apply-templates select="/" mode="phase-1" />
</xsl:variable>

<xsl:variable name="phase-2-output">
  <xsl:apply-templates select="exsl:node-set($phase-1-output)" mode="phase-2" />
</xsl:variable>

<xsl:template match="/">
  <xsl:copy-of select="$phase-2-output" />
</xsl:template>

<!--================ Phase 1 ===============================-->    
 <xsl:template match="/" mode="phase-1">
   <t>
     <xsl:apply-templates select="*/*/*/ITEM/Hierarchy" mode="phase-1" />
   </t>
 </xsl:template>

 <xsl:template match="Hierarchy" mode="phase-1">
   <xsl:call-template name="analyze-hierarchy">
     <xsl:with-param name="levels" select="." />
     <xsl:with-param name="item" select="../TITLE" />
   </xsl:call-template>  
 </xsl:template>

<xsl:template name="analyze-hierarchy"><!-- phase-1 -->
  <xsl:param name="levels" />
  <xsl:param name="item" />
  <xsl:variable name="level" select="substring-before(concat($levels,':'),':')" />
  <xsl:variable name="e-level" select="
    translate(
      substring($level,1,string-length($level) - 1),
      'abcdefghijklmnopqrstuvwxyz',
      'ABCDEFGHIJKLMNOPQRSTUVWXYZ')" />
  <xsl:choose>
    <xsl:when test="$level">
      <xsl:element name="{$e-level}">
        <xsl:attribute name="name"><xsl:value-of select="$level" /></xsl:attribute>  
        <xsl:call-template name="analyze-hierarchy">
         <xsl:with-param name="levels" select="substring-after($levels,':')" />
         <xsl:with-param name="item" select="$item" />
        </xsl:call-template>  
      </xsl:element>
    </xsl:when>
    <xsl:otherwise>
      <ITEM Name="{$item}"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>  

<!--================ Phase 2 ===============================-->    
<xsl:key name="kLevel"
   match="*[starts-with(name(),'LEVEL')]"
   use="concat(generate-id(..),'|',@name)" />

 <xsl:template match="/" mode="phase-2">
  <TREE>
    <LEVELS>
      <xsl:variable name="t" select="concat(generate-id(t),'|')" />
      <xsl:apply-templates select="t/LEVEL1[
      generate-id() = generate-id( key('kLevel',concat($t,@name))[1])
      ]" mode="phase-2-head" />
    </LEVELS>  
  </TREE>  
</xsl:template>

<xsl:template match="*[starts-with(name(),'LEVEL')]" mode="phase-2-head">
  <xsl:copy>
    <xsl:copy-of select="@*" />
    <xsl:apply-templates select="key('kLevel',concat(generate-id(..),'|',@name))"  mode="phase-2" />
    <xsl:copy-of select="ITEM" />
   </xsl:copy>
 </xsl:template>   

 <xsl:template match="*[starts-with(name(),'LEVEL')]" mode="phase-2">
      <xsl:variable name="p" select="concat(generate-id(.),'|')" />
      <xsl:apply-templates select="*[starts-with(name(),'LEVEL')][
      generate-id() = generate-id( key('kLevel',concat($p,@name))[1])
      ]" mode="phase-2-head" />
 </xsl:template>   

</xsl:stylesheet>

...with sample input produces this (not quiet correct output)...

<TREE>
  <LEVELS>
    <LEVEL1 name="level1A">
      <LEVEL2 name="level2A">
        <LEVEL3 name="level3A">
          <ITEM Name="Office Cleaning" />
        </LEVEL3>
      </LEVEL2>
      <LEVEL2 name="level2A">
        <LEVEL3 name="level3A">
          <ITEM Name="Office Cleaning1" />
        </LEVEL3>
      </LEVEL2>
      <LEVEL2 name="level2B">
        <LEVEL3 name="level3C">
          <ITEM Name="Office Cleaning2" />
        </LEVEL3>
      </LEVEL2>
    </LEVEL1>
    <LEVEL1 name="level1B">
      <LEVEL2 name="level2B">
        <LEVEL3 name="level3B">
          <ITEM Name="Office Cleaning" />
        </LEVEL3>
      </LEVEL2>
    </LEVEL1>
  </LEVELS>
</TREE>

UPDATE

Ok, round 2. I copied Dimitre's grouping rule, which is all or nothing on the content of the Hierarchy element. This solution produces the expected output for the sample input. Note that in contrast to Dimitre's <xsl:element name="LEVEL{$pDepth}"> method, I have derived the LEVEL1 style element names from the Hierarchy steps. I am not sure if this is correct.

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes" />
<xsl:strip-space elements="*" />

<xsl:key name="kLevel" match="Hierarchy" use="." />

<xsl:template match="/">
  <TREE>
    <LEVELS>
     <xsl:apply-templates select="*/*/*/ITEM/Hierarchy[
        generate-id() = generate-id( key('kLevel',.)[1])
      ]" mode="group" />
    </LEVELS>
   </TREE>
</xsl:template>

 <xsl:template match="Hierarchy" mode="group">
   <xsl:call-template name="analyze-hierarchy">
     <xsl:with-param name="key" select="." />
     <xsl:with-param name="levels" select="." />
   </xsl:call-template>  
 </xsl:template>

<xsl:template name="analyze-hierarchy">
  <xsl:param name="key" />
  <xsl:param name="levels" />
  <xsl:variable name="level" select="substring-before(concat($levels,':'),':')" />
  <xsl:variable name="e-level" select="
    translate(
      substring($level,1,string-length($level) - 1),
      'abcdefghijklmnopqrstuvwxyz',
      'ABCDEFGHIJKLMNOPQRSTUVWXYZ')" />
  <xsl:choose>
    <xsl:when test="$level">
      <xsl:element name="{$e-level}">
        <xsl:attribute name="name"><xsl:value-of select="$level" /></xsl:attribute>  
        <xsl:call-template name="analyze-hierarchy">
         <xsl:with-param name="key" select="$key" />
         <xsl:with-param name="levels" select="substring-after($levels,':')" />
        </xsl:call-template>  
      </xsl:element>
    </xsl:when>
    <xsl:otherwise>
      <xsl:apply-templates select="key('kLevel',$key)" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>  

<xsl:template match="Hierarchy">
  <ITEM Name="{../TITLE}" />
</xsl:template>

</xsl:stylesheet>
Sean B. Durkin
  • 12,659
  • 1
  • 36
  • 65
  • Sean, you are making a statement about my answer, which I don't understand. What do you mean by: "Dimitre's grouping rule, which is all or nothing on the content of the Hierarchy element" ? – Dimitre Novatchev Oct 23 '12 at 11:53
  • @DimitreNovatchev The `Hierarchy/text()` nodes can be viewed as a sequence of steps (tokenize the string value by ':'). The term 'all or nothing' means that the output is produced by grouping the input solely on the key of `Hierarchy/text()`. This is in contrast to recursive grouping where the LEVEL1 output is produced by grouping the input on the first step; and then recursively LEVEL2 output beneath LEVEL1 is produced by the grouping of input based on the first two steps. – Sean B. Durkin Oct 23 '12 at 15:19
  • A concrete example: Say the input has two `Hierarchy` elements with text content `a:b:c` and `a:x:y`. Under an all-or-nothing grouping rule, the corresponding output elements have no common ancestor (except root). Under a recursive grouping rule, the corresponding output elements would have a common ancestor, being the output (LEVEL1) element corresponding to `a`. – Sean B. Durkin Oct 23 '12 at 15:23
  • Sean, I think that you misunderstood the question -- we often can understand the question by lookeng at the provided wanted result. – Dimitre Novatchev Oct 23 '12 at 17:13