4

I need to transform large XML files that have a nested (hierarchical) structure of the form

<Root>
   Flat XML
   Hierarchical XML (multiple blocks, some repetitive)
   Flat XML
</Root>

into a flatter ("shredded") form, with 1 block for each repetitive nested block.

The data has numerous different tags and hierarchy variations (especially in the number of tags of the shredded XML before and after the hierarchical XML), so ideally no assumption should be made about tag and attribute names, or the hierarchical level.

A top-level view of the hierarchy for just 4 levels would look something like

<Level 1>
   ...
   <Level 2>
      ...
      <Level 3>
        ...
        <Level 4>A</Level 4>
        <Level 4>B</Level 4>
        ...
      </Level 3>
      ...
   </Level 2>
   ...
</Level 1>

and the desired output would then be

<Level 1>
  ...
  <Level 2>
    ...
      <Level 3>
        ...
        <Level 4>A</Level 4>
        ...
      </Level 3>
    ...
  </Level 2>
  ...
</Level 1>

<Level 1>
  ...
  <Level 2>
    ...
      <Level 3>
        ...
        <Level 4>B</Level 4>
        ...
      </Level 3>
    ...
  </Level 2>
  ...
</Level 1>

That is, if at each level i there are Li different components, a total of Product(Li) different components will be produced (just 2 above, since the only differentiating factor is Level 4, so L1*L2*L3*L4 = 2).

From what I have seen around, XSLT may be the way to go, but any other solution (e.g., StAX or even JDOM) would do.

A more detailed example, using fictitious information, would be

<Employee name="A Name">
  <Address>123 A Street</Address>
  <Age>28</Age>
  <EmploymentHistory>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
      <Jobs>3</Jobs>
      <JobDetails>
        <Job title = "Senior Developer">
          <StartDate>01/10/2001</StartDate>
          <Months>38</Months>
        </Job>
        <Job title = "Senior Developer">
          <StartDate>01/12/2004</StartDate>
          <Months>6</Months>
        </Job>
        <Job title = "Senior Developer">
          <StartDate>01/06/2005</StartDate>
          <Months>10</Months>
        </Job>
      </JobDetails>
    </Employment>
  </EmploymentHistory>
  <EmploymentHistory>
    <Employment country="UK">
      <Comment>List of previous jobs in the UK</Comment>
      <Jobs>2</Jobs>
      <JobDetails>
        <Job title = "Junior Developer">
          <StartDate>01/05/1999</StartDate>
          <Months>25</Months>
        </Job>
        <Job title = "Junior Developer">
          <StartDate>01/07/2001</StartDate>
          <Months>3</Months>
        </Job>
      </JobDetails>
    </Employment>
  </EmploymentHistory>
  <Available>true</Available>
  <Experience unit="years">6</Experience>
</Employee>

The above data should be shredded into 5 blocks (i.e., one for each different <Job> block), each of which will leave all other tags identical and just have a single <Job> element. So, given the 5 different <Job> blocks in the above example, the transformed ("shredded") XML would be

<Employee name="A Name">
  <Address>123 A Street</Address>
  <Age>28</Age>
  <EmploymentHistory>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
      <Jobs>3</Jobs>
      <JobDetails>
        <Job title = "Senior Developer">
          <StartDate>01/10/2001</StartDate>
          <Months>38</Months>
        </Job>
      </JobDetails>
      <Available>true</Available>
     <Experience unit="years">6</Experience>
    </Employment>
  </EmploymentHistory>
</Employee>

<Employee name="A Name">
  <Address>123 A Street</Address>
  <Age>28</Age>
  <EmploymentHistory>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
      <Jobs>3</Jobs>
      <JobDetails>
        <Job title = "Senior Developer">
          <StartDate>01/12/2004</StartDate>
          <Months>6</Months>
        </Job>
      </JobDetails>
      <Available>true</Available>
     <Experience unit="years">6</Experience>
    </Employment>
  </EmploymentHistory>
</Employee>

<Employee name="A Name">
  <Address>123 A Street</Address>
  <Age>28</Age>
  <EmploymentHistory>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
      <Jobs>3</Jobs>
      <JobDetails>
        <Job title = "Senior Developer">
          <StartDate>01/06/2005</StartDate>
          <Months>10</Months>
        </Job>
      </JobDetails>
      <Available>true</Available>
     <Experience unit="years">6</Experience>
    </Employment>
  </EmploymentHistory>
</Employee>

<Employee name="A Name">
  <Address>123 A Street</Address>
  <Age>28</Age>
  <EmploymentHistory>
    <Employment country="UK">
      <Comment>List of previous jobs in the UK</Comment>
      <Jobs>3</Jobs>
      <JobDetails>
        <Job title = "Junior Developer">
          <StartDate>01/05/1999</StartDate>
          <Months>25</Months>
        </Job>
      </JobDetails>
      <Available>true</Available>
     <Experience unit="years">6</Experience>
    </Employment>
  </EmploymentHistory>
</Employee>

<Employee name="A Name">
  <Address>123 A Street</Address>
  <Age>28</Age>
  <EmploymentHistory>
    <Employment country="UK">
      <Comment>List of previous jobs in the UK</Comment>
      <Jobs>3</Jobs>
      <JobDetails>
        <Job title = "Junior Developer">
          <StartDate>01/07/2001</StartDate>
          <Months>3</Months>
        </Job>
      </JobDetails>
      <Available>true</Available>
     <Experience unit="years">6</Experience>
    </Employment>
  </EmploymentHistory>
</Employee>
PNS
  • 19,295
  • 32
  • 96
  • 143
  • XSLT is ideal for this; just to understand the question a bit more, you want to repeat the Employee information for each element? Also, where does true come from? – dash Dec 17 '11 at 22:40
  • This is not "flattening". A lot of data seems simply deleted in the provided result -- only the first job-details in the first country is retained in the result. This contradicts your description of the wanted flattening. Please, edit the question and specify the complete result you want from the transformation. – Dimitre Novatchev Dec 17 '11 at 22:43
  • @dash Yes, exactly that repetition. The idea is to create "records" that associate unique values, i.e. every repetitive block (in this case, ), will have to appear as if it were the only one in the file. The and blocks follow and are at the same hierarchy level as
    , and .
    – PNS Dec 17 '11 at 22:47
  • @Dimitre Nothing is deleted, there is 1 block for every block, I just didn't write all 5 "flattened" blocks to save screen space. – PNS Dec 17 '11 at 22:49
  • I got it - I didn't scroll down! Your XML isn't well formed by the way, you aren't closing your first JobDetails element, and you are missing an EmploymentHistory opening tag. I'll post a quick xslt for you based on this. – dash Dec 17 '11 at 22:49
  • @dash Well spotted, I added the closing tag. Thanks very much for all the help! – PNS Dec 17 '11 at 22:58
  • @PNS: Please, edit the question and provide the correct result needed -- as of now this is very misleading. Also, I don't see anything flattened in the wanted result. The structure is exactly the same as in the source XML document: ` 01/10/2001 38 ` . Please, explain what is meant by "flattened". – Dimitre Novatchev Dec 17 '11 at 23:04
  • @Dimitre Not understanding a question is no reason for downvoting it, especially given that somebody else took the time to answer it already, which means it is apparently understandable. I have already made several additional comments in the discussion with dash above. Flattened means "shreded" in this context and there are several such cases in StackOverflow, but for database imports, which is not what I was after. Anyway, I added couple more phrases before the example block. I prefer this to just copy-pasting all 5 such obvious blocks and prolonging the length of an already long question. – PNS Dec 17 '11 at 23:47
  • @PNS: I have seen many thousands of questions and can well the "good" from the "not so good" ones. A question like this is definitely not an example of a well-defined question. In case you remove the inconsistensies/contradictions, the confusing terminology and if you provide an exact-wanted result, then this will become a question of acceptable quality and I will be glad to revert my downvote. – Dimitre Novatchev Dec 18 '11 at 00:04
  • @PNS: With all deep respect to the improvements you have done to the question, I don't believe a question is good, that leaves you guessing what the wanted result should be. It must be possible for you to give a minimal, but complete example with the complete wanted result. If you are unable to provide a complete wanted result, is there a guarantee at all that you know what you are asking? This is a fair doubt that arizes from the incompleteness of the question. Not only does this cause many people *not* to answer the question, but the net result will have answers that aren't what you wanted. – Dimitre Novatchev Dec 18 '11 at 00:15
  • @Dimitre The exact answer would be to copy-paste the example answer block 5 times, with a different block every time. Is that what you want? – PNS Dec 18 '11 at 00:17
  • 1
    @PNS: Yes. Copy/paste should do it. Certainly, if you want to keep the text to a minimum, your example can contain just two or three such blocks. When the wanted result is complete, one can run their transformation and simply compare their result to the result provided in the question. Being unable to do this really discourages a potential answerer to post an answer. – Dimitre Novatchev Dec 18 '11 at 00:22
  • @Dimitre This is not about ability, but about conservation of screen space, for an obvious answer. Thanks to dash, I have put together some working code already. However, because this is a very common problem, for which, surprisingly, no generic solution seems to exist, despite numerous posts asking for it, if you can provide such a generic solution, do it for the community. I have done the copy-pasting you have been asking for, as my little contribution (the original question took over 1 hour to put together, by the way, not to mention the extra time for the edits and the comments). – PNS Dec 18 '11 at 23:20
  • @PNS: Your latest wanted result contains jobs only from US -- this seems to be an error. Could you, please, give us something simpler, that would be consistent and meaningful and wouldn't take you so much time to provide? As for the provided link, it is about real flattening and has almost nothing in common with your shredding problem. – Dimitre Novatchev Dec 19 '11 at 00:14
  • @Dimitre I have edited the example to correct the XML validation issues and add some output beautification. It should be clear now. I The title of the article cannot be changed, but what I ask for is XML shredding but to XML, not to a database format, exactly as per my example. – PNS Dec 19 '11 at 18:19
  • @PNS: I am sorry, but I still don't understand what exactly is wanted. It may be useful if you try to provide a much more simplified, minimal example (may not necessarily be about employee and job history) and you must explain verbally what element gows into what / where in the result. – Dimitre Novatchev Dec 19 '11 at 19:17
  • @Dimitre I added a simpler example of shredding, hope that helps. – PNS Dec 21 '11 at 22:38
  • @PNS: I answered your question and am providing the wanted, generic solution. Please, take a look and let me know what you think. :) BTW, I reverted my downvote and now upvoted your question. – Dimitre Novatchev Dec 21 '11 at 23:09
  • @Dimitre Thanks on both accounts! I will test it more thoroughly in the next few weeks with more "complicated" variations of input and let you know. – PNS Dec 22 '11 at 13:27

2 Answers2

4

Here is a generic solution as requested:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="pLeafNodes" select="//Level-4"/>

 <xsl:template match="/">
  <t>
    <xsl:call-template name="StructRepro"/>
  </t>
 </xsl:template>

 <xsl:template name="StructRepro">
   <xsl:param name="pLeaves" select="$pLeafNodes"/>

   <xsl:for-each select="$pLeaves">
     <xsl:apply-templates mode="build" select="/*">
      <xsl:with-param name="pChild" select="."/>
      <xsl:with-param name="pLeaves" select="$pLeaves"/>
     </xsl:apply-templates>
   </xsl:for-each>
 </xsl:template>

  <xsl:template mode="build" match="node()|@*">
      <xsl:param name="pChild"/>
      <xsl:param name="pLeaves"/>

     <xsl:copy>
       <xsl:apply-templates mode="build" select="@*"/>

       <xsl:variable name="vLeafChild" select=
         "*[count(.|$pChild) = count($pChild)]"/>

       <xsl:choose>
        <xsl:when test="$vLeafChild">
         <xsl:apply-templates mode="build"
             select="$vLeafChild
                    |
                      node()[not(count(.|$pLeaves) = count($pLeaves))]">
             <xsl:with-param name="pChild" select="$pChild"/>
             <xsl:with-param name="pLeaves" select="$pLeaves"/>
         </xsl:apply-templates>
        </xsl:when>
        <xsl:otherwise>
         <xsl:apply-templates mode="build" select=
         "node()[not(.//*[count(.|$pLeaves) = count($pLeaves)])
                or
                 .//*[count(.|$pChild) = count($pChild)]
                ]
         ">

             <xsl:with-param name="pChild" select="$pChild"/>
             <xsl:with-param name="pLeaves" select="$pLeaves"/>
         </xsl:apply-templates>
        </xsl:otherwise>
       </xsl:choose>
     </xsl:copy>
 </xsl:template>
 <xsl:template match="text()"/>
</xsl:stylesheet>

When applied on the provided simplified (and generic) XML document:

<Level-1>
   ...
   <Level-2>
      ...
      <Level-3>
        ...
        <Level-4>A</Level-4>
        <Level-4>B</Level-4>
        ...
      </Level-3>
      ...
   </Level-2>
   ...
</Level-1>

the wanted, correct result is produced:

<Level-1>
   ...
   <Level-2>
      ...
      <Level-3>
         <Level-4>A</Level-4>
      </Level-3>
      ...
   </Level-2>
   ...
</Level-1>
<Level-1>
   ...
   <Level-2>
      ...
      <Level-3>
         <Level-4>B</Level-4>
      </Level-3>
      ...
   </Level-2>
   ...
</Level-1>

Now, if we change the line:

 <xsl:param name="pLeafNodes" select="//Level-4"/>

to:

 <xsl:param name="pLeafNodes" select="//Job"/>

and apply the transformation to the Employee XML document:

<Employee name="A Name">
    <Address>123 A Street</Address>
    <Age>28</Age>
    <EmploymentHistory>
        <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
            <Jobs>3</Jobs>
            <JobDetails>
                <Job title = "Senior Developer">
                    <StartDate>01/10/2001</StartDate>
                    <Months>38</Months>
                </Job>
                <Job title = "Senior Developer">
                    <StartDate>01/12/2004</StartDate>
                    <Months>6</Months>
                </Job>
                <Job title = "Senior Developer">
                    <StartDate>01/06/2005</StartDate>
                    <Months>10</Months>
                </Job>
            </JobDetails>
        </Employment>
    </EmploymentHistory>
    <EmploymentHistory>
        <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
            <Jobs>2</Jobs>
            <JobDetails>
                <Job title = "Junior Developer">
                    <StartDate>01/05/1999</StartDate>
                    <Months>25</Months>
                </Job>
                <Job title = "Junior Developer">
                    <StartDate>01/07/2001</StartDate>
                    <Months>3</Months>
                </Job>
            </JobDetails>
        </Employment>
    </EmploymentHistory>
    <Available>true</Available>
    <Experience unit="years">6</Experience>
</Employee>

we again get the wanted, correct result:

<t>
   <Employee name="A Name">
      <Address>123 A Street</Address>
      <Age>28</Age>
      <EmploymentHistory>
         <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
            <Jobs>3</Jobs>
            <JobDetails>
               <Job title="Senior Developer">
                  <StartDate>01/10/2001</StartDate>
                  <Months>38</Months>
               </Job>
            </JobDetails>
         </Employment>
      </EmploymentHistory>
      <Available>true</Available>
      <Experience unit="years">6</Experience>
   </Employee>
   <Employee name="A Name">
      <Address>123 A Street</Address>
      <Age>28</Age>
      <EmploymentHistory>
         <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
            <Jobs>3</Jobs>
            <JobDetails>
               <Job title="Senior Developer">
                  <StartDate>01/12/2004</StartDate>
                  <Months>6</Months>
               </Job>
            </JobDetails>
         </Employment>
      </EmploymentHistory>
      <Available>true</Available>
      <Experience unit="years">6</Experience>
   </Employee>
   <Employee name="A Name">
      <Address>123 A Street</Address>
      <Age>28</Age>
      <EmploymentHistory>
         <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
            <Jobs>3</Jobs>
            <JobDetails>
               <Job title="Senior Developer">
                  <StartDate>01/06/2005</StartDate>
                  <Months>10</Months>
               </Job>
            </JobDetails>
         </Employment>
      </EmploymentHistory>
      <Available>true</Available>
      <Experience unit="years">6</Experience>
   </Employee>
   <Employee name="A Name">
      <Address>123 A Street</Address>
      <Age>28</Age>
      <EmploymentHistory>
         <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
            <Jobs>2</Jobs>
            <JobDetails>
               <Job title="Junior Developer">
                  <StartDate>01/05/1999</StartDate>
                  <Months>25</Months>
               </Job>
            </JobDetails>
         </Employment>
      </EmploymentHistory>
      <Available>true</Available>
      <Experience unit="years">6</Experience>
   </Employee>
   <Employee name="A Name">
      <Address>123 A Street</Address>
      <Age>28</Age>
      <EmploymentHistory>
         <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
            <Jobs>2</Jobs>
            <JobDetails>
               <Job title="Junior Developer">
                  <StartDate>01/07/2001</StartDate>
                  <Months>3</Months>
               </Job>
            </JobDetails>
         </Employment>
      </EmploymentHistory>
      <Available>true</Available>
      <Experience unit="years">6</Experience>
   </Employee>
</t>

Explanation: The processing is done in a named template (StructRepro) and controlled by a single external parameter named pLeafNodes, that must contain a nodeset of all nodes whose "upward structure" is to be reproduced in the result.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Excellent and concise! Is there any way of avoiding putting down the "Level-4" tag at all? That would make it 100% generic. :-) – PNS Dec 22 '11 at 03:21
  • @PNS:Yes, we can define this code as a template (named or moded) and the parameter it uses can be a node-set, containing all nodes that must be processed as "leaf: nodes. I am also trying to find a better word to describe this process. "Sred" is obviously the opposite of what happens here (and at the root of my so long confusion). In fact, what we have is "structure reproduction" -- something like reproduction by cell division. – Dimitre Novatchev Dec 22 '11 at 03:44
  • @PNS: I edited my answer to provided the fully generic solution you were looking for. Enjoy. :) – Dimitre Novatchev Dec 22 '11 at 04:05
  • @PNS: Edited again. Now the code can be applied to both XML documents provided by you and needs only setting a global (external) parameter with the node-set of the "leaf nodes" that define the structure reproduction. Exactly the generic solution you were hoping for. – Dimitre Novatchev Dec 22 '11 at 05:18
  • As already mentioned in my other comments, excellent and thanks! Out of academic interest, would there be a way of not even supplying the "end node" name (e.g., by restricting the maximum "depth" the transformation would be applied)? – PNS Dec 22 '11 at 13:33
  • @PNS: I thought about this, but there appears to be no general *and* reliable way to specify the wanted nodes. Even in your example the `Job` elements aren't "physical" leaves -- they have children elements. If just depth is specified, there might exist other elements at the same depth that we don't want to use as cores for the reproduction. Finally, why prohibit these nodes to be at different depth? And wouldn't it be error-prone to count depth manually? I think that the current solution is more generic than a solution that would require a "depth argument". – Dimitre Novatchev Dec 22 '11 at 13:44
  • Fair enough. Perhaps the answer to what I was saying is to just use JDOM. Anyway, your code is very close to a 100% generic solution. – PNS Dec 22 '11 at 14:12
  • @PNS: The solution is not "very close to a 100% generic solution." -- it is 100% generic. Remember that *generic* and *parameterized* are synonyms. The `StructRepro` template doesn't contain even a single hardcoded value. Any valuable solution must be given some parameter(s) and this is the opposite of lack of parameterization. – Dimitre Novatchev Dec 23 '11 at 02:57
  • Agreed. I just interpret "generic" as "fully autodetected", i.e. no parameterization at all. But, as I have already noted in previous comments above, your solution is excellent and I am sure it will help many others looking up this surprisingly underworked problem. – PNS Dec 24 '11 at 00:01
  • I just tried the generic solution (after using Dash's customized one thus far), employing Java to run the XSLT transformation, and apparently it does not work: in the example, the "Employee" tag is correctly repeated 5 times, but with all job details and not one each time, as it should. Is there some address I could send you the details? – PNS Jan 14 '12 at 14:07
  • 1
    @PNS: Are you saying that you do not get the same output that I am getting? If so, the reason can be that either you have modified the source XML or you have modified the XSLT code, or your XSLT processor may be buggy. I have run this transformation with many different XSLT processors and with all of them I get the result that is presented in my answer. – Dimitre Novatchev Jan 14 '12 at 15:39
  • 1
    @PNS: With all of the following XSLT processors I get (and you or anybody else can run the transformation and confirm this) the same result: MSXML3, MSXML4, MSXML6, .NET XslCompiledTransform, .NET XslTransform, Saxon 6.5.4, Saxon 9.1.07, XQSharp. – Dimitre Novatchev Jan 14 '12 at 15:49
  • I am sure you are getting what you are saying, but I haven't modified the XSLT and it doesn't work with the Java Transformer class (which is not buggy). I can send you full source code, if you provide an e-mail address. Also, the generic XSLT probably doesn't work if, among the shredded tags there are other tags that must be repeated (e.g., in the example above, the repeated part of the Level 4 tag includes tags before and after (denoted as ...) and not only A). Again, if you provide an e-mail address, I can give you full source code. Or I could edit in this question. – PNS Jan 14 '12 at 22:44
  • @PNS: I am curious to know how you arrived at the conclusion that your XSLT processor isn't buggy given the fact that it produces a different result from the result produced by 9 other XSLT processors? – Dimitre Novatchev Jan 15 '12 at 15:43
3

Given the following XML:

<?xml version="1.0" encoding="utf-8" ?>
<Employee name="A Name">
  <Address>123 A Street</Address>
  <Age>28</Age>
  <EmploymentHistory>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
      <Jobs>3</Jobs>
      <JobDetails>
        <Job title = "Developer">
          <StartDate>01/10/2001</StartDate>
          <Months>38</Months>
        </Job>
        <Job title = "Developer">
          <StartDate>01/12/2004</StartDate>
          <Months>6</Months>
        </Job>
        <Job title = "Developer">
          <StartDate>01/06/2005</StartDate>
          <Months>10</Months>
        </Job>
      </JobDetails>
      </Employment>
      <Employment country="UK">
        <Comment>List of previous jobs in the UK</Comment>
        <Jobs>2</Jobs>
        <JobDetails>
          <Job title = "Developer">
            <StartDate>01/05/1999</StartDate>
            <Months>25</Months>
          </Job>
          <Job title = "Developer">
            <StartDate>01/07/2001</StartDate>
            <Months>3</Months>
          </Job>
        </JobDetails>
        </Employment>
  </EmploymentHistory>
  <Available>true</Available>
  <Experience unit="years">6</Experience>
</Employee>

The following XSLT:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">

    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/">
      <Output>
        <xsl:apply-templates select="//Employee/EmploymentHistory/Employment/JobDetails/Job" />
      </Output>
    </xsl:template>

  <xsl:template match="//Employee/EmploymentHistory/Employment/JobDetails/Job">
    <Employee>
      <xsl:attribute name="name">
        <xsl:value-of select="ancestor::Employee/@name"/>
      </xsl:attribute>
      <Address>
        <xsl:value-of select="ancestor::Employee/Address"/>
      </Address>
      <Age>
        <xsl:value-of select="ancestor::Employee/Age"/>
      </Age>
      <EmploymentHistory>
        <Employment>
          <xsl:attribute name="country">
            <xsl:value-of select="ancestor::Employment/@country"/>
          </xsl:attribute>
          <Comment>
            <xsl:value-of select="ancestor::Employment/Comment"/>
          </Comment>
          <Jobs>
            <xsl:value-of select="ancestor::Employment/Jobs"/>
          </Jobs>
          <JobDetails>
            <xsl:copy-of select="."/>
          </JobDetails>
          <Available>
            <xsl:value-of select="ancestor::Employee/Available"/>
          </Available>
          <Experience>
            <xsl:attribute name="unit">
              <xsl:value-of select="ancestor::Employee/Experience/@unit"/>
            </xsl:attribute>
            <xsl:value-of select="ancestor::Employee/Experience"/>
          </Experience>
        </Employment>
      </EmploymentHistory>
    </Employee>

  </xsl:template>


</xsl:stylesheet>

Gives the following output:

<?xml version="1.0" encoding="utf-8"?>
<Output>
  <Employee name="A Name">
    <Address>123 A Street</Address>
    <Age>28</Age>
    <EmploymentHistory>
      <Employment country="US">
        <Comment>List of previous jobs in the US</Comment>
        <Jobs>3</Jobs>
        <JobDetails>
          <Job title="Developer">
          <StartDate>01/10/2001</StartDate>
          <Months>38</Months>
        </Job>
        </JobDetails>
        <Available>true</Available>
        <Experience unit="years">6</Experience>
      </Employment>
    </EmploymentHistory>
  </Employee>
  <Employee name="A Name">
    <Address>123 A Street</Address>
    <Age>28</Age>
    <EmploymentHistory>
      <Employment country="US">
        <Comment>List of previous jobs in the US</Comment>
        <Jobs>3</Jobs>
        <JobDetails>
          <Job title="Developer">
          <StartDate>01/12/2004</StartDate>
          <Months>6</Months>
        </Job>
        </JobDetails>
        <Available>true</Available>
        <Experience unit="years">6</Experience>
      </Employment>
    </EmploymentHistory>
  </Employee>
  <Employee name="A Name">
    <Address>123 A Street</Address>
    <Age>28</Age>
    <EmploymentHistory>
      <Employment country="US">
        <Comment>List of previous jobs in the US</Comment>
        <Jobs>3</Jobs>
        <JobDetails>
          <Job title="Developer">
          <StartDate>01/06/2005</StartDate>
          <Months>10</Months>
        </Job>
        </JobDetails>
        <Available>true</Available>
        <Experience unit="years">6</Experience>
      </Employment>
    </EmploymentHistory>
  </Employee>
  <Employee name="A Name">
    <Address>123 A Street</Address>
    <Age>28</Age>
    <EmploymentHistory>
      <Employment country="UK">
        <Comment>List of previous jobs in the UK</Comment>
        <Jobs>2</Jobs>
        <JobDetails>
          <Job title="Developer">
            <StartDate>01/05/1999</StartDate>
            <Months>25</Months>
          </Job>
        </JobDetails>
        <Available>true</Available>
        <Experience unit="years">6</Experience>
      </Employment>
    </EmploymentHistory>
  </Employee>
  <Employee name="A Name">
    <Address>123 A Street</Address>
    <Age>28</Age>
    <EmploymentHistory>
      <Employment country="UK">
        <Comment>List of previous jobs in the UK</Comment>
        <Jobs>2</Jobs>
        <JobDetails>
          <Job title="Developer">
            <StartDate>01/07/2001</StartDate>
            <Months>3</Months>
          </Job>
        </JobDetails>
        <Available>true</Available>
        <Experience unit="years">6</Experience>
      </Employment>
    </EmploymentHistory>
  </Employee>
</Output>

Note that I've added an Output root element to ensure the document is well formed.

Is this what you wanted?

You might also be able to use xsl:copy to copy the higher level elements, but I need to think about this one a bit more. With the above xslt, you have more control, but also you have to redefine your elements...

dash
  • 89,546
  • 4
  • 51
  • 71
  • I am actually looking for generic XSLT code (like in the answer to http://stackoverflow.com/questions/1900184/how-to-break-the-tree-structure-of-the-xml-document-to-desired-one), if possible. But, even if that can't be done, I am grateful for the help anyway (and testing it now)! :-) – PNS Dec 17 '11 at 23:12
  • The problem with that XSLT is that it is doing exactly what Dimitre is talking about - it's flattening a hierarchy. What you actually want to do is repeat all of JobDetails ancestors, but exclude all of the sibling JobDetails. An additional problem is the Employment country="" subtree - this makes it harder to establish which elements you don't want to keep. If that makes sense? – dash Dec 17 '11 at 23:18
  • The flattening procedure is very deterministic: every element (in this case, ) that is repeated, produces a new full block with all the other information intact and only itself being the difference. In general, this applies to all hierarchy levels. If, say, there were M elements repeated in a level above and below each of these M elements there were N different elements, then a total of MxN blocks would be produced. Imagine needing to put this in a database, where each row would be a different block "path" every time. Essentially, it is XML shredding. – PNS Dec 17 '11 at 23:28
  • I get that :-) The problem is that a generic solution would probably involve xsl:copy-of and excluding all of the other siblings is difficult. This actually might be easier in DOM processing! – dash Dec 17 '11 at 23:35
  • @PNS: Please, clarify your last comment above and include it in the question, giving a complete and correct wanted result. If the operation has nothing to do with "flattening" call it with a more appropriate, meaningful and not confusing name. – Dimitre Novatchev Dec 17 '11 at 23:38
  • @dash JDOM would also be OK, especially if it provided a more generic answer. – PNS Dec 18 '11 at 00:04
  • @Dimitre I added couple more phrases in the question and replied to your comment above. I trust this is satisfactory for you. – PNS Dec 18 '11 at 00:04
  • @DimitreNovatchev Wow, I've been looking through your XSLT answers. Amazing - I thought I knew XSLT ;-) – dash Dec 18 '11 at 00:23
  • @dash: You are welcome. If you enjoy XSLT you may find my blog useful, too: http://dnovatchev.wordpress.com – Dimitre Novatchev Dec 18 '11 at 00:26
  • @Dimitre Can a generic answer (i.e., without having to supply tag or attribute names) be created for my case? This is actually very interesting more generally, for shredding an XML file. – PNS Dec 18 '11 at 01:14
  • @PNS: If/when I know what you want to do exactly, I will be able to answer whether a generic solution could exist. Please, try to provide a minimal, but complete example and to explain the requirements for the transformation. – Dimitre Novatchev Dec 18 '11 at 03:59
  • @Dimitre I have edited the original question to show the exact transformation of the example XML it contains. The objective is to find a generic solution, i.e. one that "shreds" the original XML recursively, without any idea about the tag and attribute names. See, for example, http://stackoverflow.com/questions/1900184/how-to-break-the-tree-structure-of-the-xml-document-to-desired-one. It would be a great contribution if you could put together such a solution. – PNS Dec 18 '11 at 23:27
  • @PNS, dash: I have produced the generic solution that PNS has been asking for. – Dimitre Novatchev Dec 22 '11 at 05:54
  • @Dimitre A piece of art, I would say! From what I have seen around, there is hardly any fully open source, totally loyalty-free and generic code that does XML shredding and people keep asking. You have made a truly useful contribution to the community I think, and may thanks for that! – PNS Dec 22 '11 at 13:29
  • @PNS: Thank *yoy* for the excellent question. Now I completely understand why it was challenging even to formulate the problem. – Dimitre Novatchev Dec 22 '11 at 13:32
  • @Dimitre Yes, it is very time consuming even to describe it, let alone find a solution. – PNS Dec 23 '11 at 23:59