2

This is the input file.

All these blocks are wrapped in a <allocfile> tag which is not appearing, dunno why? And all these blocks are wrapped in a top level element <xml>.

<XML>
  <AllocFile>
    <alc>1</alc>
    <No>11/10</No>
    <DT>20090401</DT> 
    <G_H>147</G_H>
    <FUN>125487</FUN>
    <oH>11</oH>
    <y>9</y>
    <AMOUNT>8000000</AMOUNT>
    <Code>033195</Code>
    <hd1>1234</hd1>
  </AllocFile>
  <AllocFile>
    <alc>2</alc>
    <No>14/10</No>
    <DT>20090401</DT>
    <G_H>147</G_H>
    <FUN>125487</FUN>
    <oH>11</oH>
    <y>9</y>
    <AMOUNT>8400000</AMOUNT>
    <Code>033195</Code>
    <hd1>1234</hd1>
  </AllocFile>
  <AllocFile>
    <alc>3</alc>
    <No>74/10</No>
    <DT>20090401</DT>
    <G_H>147</G_H>
    <FUN>125487</FUN>
    <oH>11</oH>
    <y>9</y>
    <AMOUNT>8740000</AMOUNT>
    <Code>033195</Code>
    <hd1>1234</hd1>
  </AllocFile>
  <AllocFile>
    <alc>2</alc>
    <No>74/10</No>
    <DT>20090401</DT>
    <G_H>117</G_H>
    <FUN>125487</FUN>
    <oH>19</oH>
    <y>9</y>
    <AMOUNT>74512</AMOUNT>
    <Code>033118</Code>
    <hd1>1234</hd1>
  </AllocFile>
  <AllocFile>
    <alc>3</alc>
    <No>14/10</No>
    <DT>20090401</DT>
    <G_H>117</G_H>
    <FUN>125487</FUN>
    <oH>19</oH>
    <y>9</y>
    <AMOUNT>986541</AMOUNT>
    <Code>033147</Code>
    <hd1>1234</hd1>
  </AllocFile> 
</XML>

The output is

<Header1>
  <Hd1>1234</Hd1>
  <CodeHeader>
    <Code>033195</Code>
    <Header2>
      <G_H>147</G_H>
      <FUN>125487</FUN>
      <oH>11</oH>
      <y>9</y>
      <allocheader>
        <alc>1</alc>
        <No>11/10</No>
        <DT>20090401</DT>
        <AMOUNT>8000000</AMOUNT>
      </allocheader>
      <allocheader>
        <alc>2</alc>
        <No>14/10</No>
        <DT>20090401</DT>
        <AMOUNT>8400000</AMOUNT>
      </allocheader>
      <allocheader>
        <alc>3</alc>
        <No>74/10</No>
        <DT>20090401</DT>
        <AMOUNT>8740000</AMOUNT>
      </allocheader>
    </Header2>
  </CodeHeader>
  <CodeHeader>
        <Code>033118</Code>
        <Header2>
      <G_H>117</G_H>
      <FUN>125487</FUN>
         <oH>19</oH>
            <y>9</y>
             <allocheader>
             <alc>2</alc>
             <No>74/10</No>
             <DT>20090401</DT>
             <AMOUNT>74512</AMOUNT>
           </allocheader>
       </Header2>
    </codeHeader>
   <CodeHeader>
        <Code>033147</Code>
           <Header2>
          <G_H>117</G_H>
          <FUN>125487</FUN>
          <oH>19</oH>
          <y>9</y>
         <allocheader>
           <alc>3</alc>
            <No>14/10</No>
            <DT>20090401</DT>
            <AMOUNT>986541</AMOUNT>
          </allocheader>
         </Header2>
      </CodeHeader>
</Header1>

The input file needs to be sorted and grouped on the basis of multiple keys. I proceeded using the concat function and the Muenchian method but didn't much help from the web. I am using XSLT 1.0.

Rules for Grouping

  • All the nodes in the file will have <hd1> with values 1234.. this becomes the first group by key and appears in the output as <Header1>

    • the second key for grouping is the node code . nodes having same value get grouped together. appears as. code header
  • The second key is the group of nodes G_H, FUN, oH, y. If all these have the same values for nodes, they get grouped together. It appears in the output as <Header2>

  • No grouping happens on the nodes <alc>, <No>, <DT>, <AMOUNT>. They have distinct values within each group.

Manks
  • 35
  • 1
  • 6
  • The provided desired output isn't well-formed XML document -- please, correct. – Dimitre Novatchev Jun 02 '12 at 15:42
  • added the tag at the end of o/p xml to make it a well formed xml doc – Manks Jun 02 '12 at 16:19
  • @Manks: You need to prefix any code samples with four spaces for it to show up. I've done this for you, so it should all display correctly now! – Tim C Jun 02 '12 at 18:44
  • You say "All the nodes in the file will have `` with values 1234"? Will this always be the case? What happens if `` contains another value? if they are all the same, you are not really grouping by them, and only need the one key. – Tim C Jun 02 '12 at 18:58
  • I have updated my current answer to cope with the expanded requirements – Tim C Jun 04 '12 at 10:15
  • tons of thanks for the answer and detailed explanation :) !!! If possible can u tell me any good source to study about xslt on the web.I tried 'w3c' but its kinda ok, not too good. – Manks Jun 04 '12 at 11:42

1 Answers1

9

If the hd1 element is always '1234' then you are not really grouping by them, but if you were you would define a simple key like so

<xsl:key name="header1" match="AllocFile" use="hd1" />

For the second key, you would need to take account of the Code element

<xsl:key name="header2" match="AllocFile" use="concat(hd1, '|', Code)" />

And then for the last key, you would define a more complicated key to cope with all the elements

<xsl:key name="header3" 
   match="AllocFile" 
   use="concat(hd1 '|', Code, '|', G_H, '|', FUN, '|', oH, '|', y)" />

Do note the use of the 'pipe' character as the delimiter. It is important to pick a delimited that would never occur in any of the selected elements.

Then, to look for the distinct header1 elements, you would look for the elements which appear first in the header1 key

<xsl:apply-templates 
   select="AllocFile[generate-id() = generate-id(key('header1', hd1)[1])]" 
   mode="header1" />

To find the distinct Code elements within each header1 element, you would do the following

<xsl:apply-templates 
   select="key('header1', hd1)
     [generate-id() = generate-id(key('header2', concat(hd1, '|', Code))[1])]" 
   mode="header2" /> 

Finally, within each code group, to find the distinct 'header3' elements, you would look for the first elements within the third key

<xsl:apply-templates 
 select="key('header2', concat(hd1, '|', Code))
    [generate-id() = 
     generate-id(key('header3', concat(hd1, '|', Code, '|', G_H, '|', FUN, '|', oH, '|', y))[1])]" 
 mode="header3" /> 

Here is the full XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="xml" indent="yes"/>

   <xsl:key name="header1" match="AllocFile" use="hd1"/>
   <xsl:key name="header2" match="AllocFile" use="concat(hd1, '|', Code)"/>
   <xsl:key name="header3" match="AllocFile" use="concat(hd1, '|', Code, '|', G_H, '|', FUN, '|', oH, '|', y)"/>

   <xsl:template match="/XML">
      <xsl:apply-templates select="AllocFile[generate-id() = generate-id(key('header1', hd1)[1])]" mode="header1"/>
   </xsl:template>

   <xsl:template match="AllocFile" mode="header1">
      <Header1>
         <Hd1>
            <xsl:value-of select="hd1"/>
         </Hd1>
         <xsl:apply-templates select="key('header1', hd1)[generate-id() = generate-id(key('header2', concat(hd1, '|', Code))[1])]" mode="header2"/>
      </Header1>
   </xsl:template>

   <xsl:template match="AllocFile" mode="header2">
      <CodeHeader>
         <xsl:copy-of select="Code"/>
         <xsl:apply-templates select="key('header2', concat(hd1, '|', Code))[generate-id() = generate-id(key('header3', concat(hd1, '|', Code, '|', G_H, '|', FUN, '|', oH, '|', y))[1])]" mode="header3"/>
      </CodeHeader>
   </xsl:template>

   <xsl:template match="AllocFile" mode="header3">
      <Header2>
         <xsl:copy-of select="G_H|FUN|oH|y"/>
         <xsl:apply-templates select="key('header3', concat(hd1, '|', Code, '|', G_H, '|', FUN, '|', oH, '|', y))"/>
      </Header2>
   </xsl:template>

   <xsl:template match="AllocFile">
      <allocheader>
         <xsl:copy-of select="alc|No|DT|AMOUNT"/>
      </allocheader>
   </xsl:template>
</xsl:stylesheet>

Do note the use of the mode attribute on the template matching to distinguish between the multiple templates all matching the AllocFile elements.

When applied to your sample XML, the following is output

<Header1>
   <Hd1>1234</Hd1>
   <CodeHeader>
      <Code>033195</Code>
      <Header2>
         <G_H>147</G_H>
         <FUN>125487</FUN>
         <oH>11</oH>
         <y>9</y>
         <allocheader>
            <alc>1</alc>
            <No>11/10</No>
            <DT>20090401</DT>
            <AMOUNT>8000000</AMOUNT>
         </allocheader>
         <allocheader>
            <alc>2</alc>
            <No>14/10</No>
            <DT>20090401</DT>
            <AMOUNT>8400000</AMOUNT>
         </allocheader>
         <allocheader>
            <alc>3</alc>
            <No>74/10</No>
            <DT>20090401</DT>
            <AMOUNT>8740000</AMOUNT>
         </allocheader>
      </Header2>
   </CodeHeader>
   <CodeHeader>
      <Code>033118</Code>
      <Header2>
         <G_H>117</G_H>
         <FUN>125487</FUN>
         <oH>19</oH>
         <y>9</y>
         <allocheader>
            <alc>2</alc>
            <No>74/10</No>
            <DT>20090401</DT>
            <AMOUNT>74512</AMOUNT>
         </allocheader>
      </Header2>
   </CodeHeader>
   <CodeHeader>
      <Code>033147</Code>
      <Header2>
         <G_H>117</G_H>
         <FUN>125487</FUN>
         <oH>19</oH>
         <y>9</y>
         <allocheader>
            <alc>3</alc>
            <No>14/10</No>
            <DT>20090401</DT>
            <AMOUNT>986541</AMOUNT>
         </allocheader>
      </Header2>
   </CodeHeader>
</Header1>

If you did have different hd1 elements, other than '1234' you would end up with multiple Header1 elements, and so your output would not be well-formed XML. It would be simple enough to wrap them in a root element though by modified the initial template matching the document element.

<xsl:template match="/XML">
   <Root>
      <xsl:apply-templates select="AllocFile[generate-id() = generate-id(key('header1', hd1)[1])]" mode="header1" />
   </Root>
</xsl:template>
Tim C
  • 70,053
  • 14
  • 74
  • 93
  • the answer provided is awesome , but I missed a rule for grouping which I have added now , can u please what more modifications to the xslt is required. I tried to add one more key for code header and in the header 1 match added one more node xsl: apply templates corr. to the code header .but it gave me compilation error. – Manks Jun 04 '12 at 07:24
  • Can you expand you question a bit more, because I am not quite sure of the new requirement. The output shown in my answer currently matches the expected output in your question, you see. Thanks! – Tim C Jun 04 '12 at 09:21
  • the is perfectly correct but my point was that if there are nodes with multiple values , how to proceed with grouping . The Changes I made to the XSL was 1. defined a key and 2. and added and removed the statement .but something went wrong and its not giving the desired output . – Manks Jun 04 '12 at 09:37
  • Can you amend your input sample to show the case where there are multiple `` values? Will there be a limit on the number of `` values that can appear? – Tim C Jun 04 '12 at 09:40
  • I have changed the code values for the last two blocks. There is no limit as such it can be different for all these blocks(worst case) and can be the same also(best case). – Manks Jun 04 '12 at 09:44
  • I am beginning to understand now! But can you amend your expected output, because codes 033118 and 033147 do not appear in your output, so am I not 100% sure how you want to handle them. Thanks! – Tim C Jun 04 '12 at 09:47
  • Changed the output to reflect the scenario ..Thanks in advance – Manks Jun 04 '12 at 09:56
  • tons of thanks for the answer and detailed explanation :) !!! If possible can u tell me any good source to study about xslt on the web.I tried w3c but its kinda ok not too good. – Manks Jun 04 '12 at 11:41