1

What would be the best (preferably most efficient) method to do the following:

Consider I have an XML document as such:

<comments question_id="123">
    <comment id="1">
       This is the first comment
    </comment>
    <comment id="2">
       This is the second comment
    </comment>
</comments>

Now, given that I specify the “path” to each data-block, i.e. in this case:

path: /comments/comment

I would like to break up the document into n-number of sub-parts, in this case 2:

<comments question_id="123">
    <comment id="1">
       This is the first comment
    </comment>
</comments>

<comments question_id="123">
    <comment id="2">
       This is the second comment
    </comment>
</comments>

So, essentially, what I am trying to do is get each node produced by “/comments/comment”, but also retain all “outer” parent nodes data.

EDIT:

Note: this needs to be dynamic, or generic. I.e. the above data xml is just an example. I want to be able to transform any xml document to this effect, given a “path” representing each data-node. And the rest is the outer body of the xml.

Filburt
  • 17,626
  • 12
  • 64
  • 115
Larry
  • 11,439
  • 15
  • 61
  • 84
  • XPath allows you to select nodes in an existing document, not to transform it. For that task you use XSLT, not XPath. Do you want to use XSLT? Do you want several output documents? Or just all those fragments combined in a single document? – Martin Honnen Feb 21 '12 at 14:54
  • @MartinHonnen Thanks for your comment. I don’t mind using Xslt, so I’ve edited the question to this effect. I would actually like four separate documents. – Larry Feb 21 '12 at 15:06
  • @Larry: Yes, a generic solution exists. – Dimitre Novatchev Feb 21 '12 at 15:41
  • Larry, if you want several result documents from one XSLT transformation you should look into XSLT 2.0 and `xsl:result-document`: http://www.w3.org/TR/xslt20/#creating-result-trees. – Martin Honnen Feb 21 '12 at 16:29

2 Answers2

0

If you actuall want to transform your Xml using Xslt it should look like this:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/">
        <!-- You need a document root to produce valid Xml output -->
        <yourdocumentroot>
            <xsl:apply-templates />
        </yourdocumentroot>
    </xsl:template>

    <xsl:template match="comments">
        <xsl:apply-templates />
    </xsl:template>

    <xsl:template match="comment">
        <xsl:element name="comments">
            <xsl:attribute name="question_id">
                <xsl:value-of select="ancestor::comments/@question_id"/>
            </xsl:attribute>
            <xsl:element name="comment">
                <xsl:attribute name="id">
                    <xsl:value-of select="./@id" />
                </xsl:attribute>
                <xsl:value-of select="./text()" />
            </xsl:element>
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>

... will give you the desired output:

<?xml version="1.0" encoding="utf-8"?>
<yourdocumentroot>
    <comments question_id="123">
        <comment id="1">This is the first comment</comment>
    </comments>
    <comments question_id="123">
        <comment id="2">This is the second comment</comment>
    </comments>
</yourdocumentroot>
Filburt
  • 17,626
  • 12
  • 64
  • 115
  • Thanks, but I forgot to mention, I wanted something generic. I have now edited this into my question. Could you update your response to enable this. – Larry Feb 21 '12 at 15:09
0

A generic "shredding" solution can be found in my answer to this question: https://stackoverflow.com/a/8597577/36305

Here is the complete transformation:

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:param name="pLeafNodes" select="//comment"/>

     <xsl:template match="/">
      <t>
        <xsl:call-template name="StructRepro"/>
      </t>
     </xsl:template>

     <xsl:template name="StructRepro">
       <xsl:param name="pLeaves" select="$pLeafNodes"/>

       <xsl:for-each select="$pLeaves">
         <xsl:apply-templates mode="build" select="/*">
          <xsl:with-param name="pChild" select="."/>
          <xsl:with-param name="pLeaves" select="$pLeaves"/>
         </xsl:apply-templates>
       </xsl:for-each>
     </xsl:template>

      <xsl:template mode="build" match="node()|@*">
          <xsl:param name="pChild"/>
          <xsl:param name="pLeaves"/>

         <xsl:copy>
           <xsl:apply-templates mode="build" select="@*"/>

           <xsl:variable name="vLeafChild" select=
             "*[count(.|$pChild) = count($pChild)]"/>

           <xsl:choose>
            <xsl:when test="$vLeafChild">
             <xsl:apply-templates mode="build"
                 select="$vLeafChild
                        |
                          node()[not(count(.|$pLeaves) = count($pLeaves))]">
                 <xsl:with-param name="pChild" select="$pChild"/>
                 <xsl:with-param name="pLeaves" select="$pLeaves"/>
             </xsl:apply-templates>
            </xsl:when>
            <xsl:otherwise>
             <xsl:apply-templates mode="build" select=
             "node()[not(.//*[count(.|$pLeaves) = count($pLeaves)])
                    or
                     .//*[count(.|$pChild) = count($pChild)]
                    ]
             ">

                 <xsl:with-param name="pChild" select="$pChild"/>
                 <xsl:with-param name="pLeaves" select="$pLeaves"/>
             </xsl:apply-templates>
            </xsl:otherwise>
           </xsl:choose>
         </xsl:copy>
     </xsl:template>
     <xsl:template match="text()"/>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<comments question_id="123">
    <comment id="1">
      This is the first comment
  </comment>
    <comment id="2">
      This is the second comment
 </comment>
</comments>

the wanted, correct result is produced:

<t>
   <comments question_id="123">
      <comment id="1">
      This is the first comment
  </comment>
   </comments>
   <comments question_id="123">
      <comment id="2">
      This is the second comment
 </comment>
   </comments>
</t>
Community
  • 1
  • 1
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Thanks so much, this looks excellent. One question: so is the only variable in this code the line where we specify `select="//comment”`? If I wanted this completely generic, as say I wanted to apply this transformation to series of xml docs from within Java, what’s the best way to go about this? – Larry Feb 21 '12 at 16:00
  • @Larry: This is not a variable -- this is a global `xsl:param` and its main purpose is exactly what you are asking for -- that it can be set externally by the program that invokes the transformation. How a parameter to the transformation is set is implementation - dependent. You have to read the documentation of your XSLT processor. I know how to do this for .NET XslCompiledTransform or for MSXML or for Saxon. – Dimitre Novatchev Feb 21 '12 at 16:04
  • Could you please let me know: I am using JAXP in Java, and trying to set the param using: `trans.setParameter("pLeafNodes", "//comment”)` - Although I get the following error: `SystemId Unknown; Line #0; Column #0; Can not convert #STRING to a NodeList!` And, (when no param is set, the transformation works well!) – Larry Feb 22 '12 at 17:23
  • @Larry: I haven't used JAXP, but the error is obvious in your case -- you are passing a string (the unevaluated XPath expression), but you should be passing something like an `XmlNodelist`, which you get as result of evaluating the XPath expression `//comment`. – Dimitre Novatchev Feb 22 '12 at 17:29