7

What XSL script will indent my data?

For example:

 <dtd name="cited">
 <XMLDOC>
 <cited year="2010">
 <case>
 No.&nbsp;275 v. M.N.R. 
 <cite>
 <yr>
 2010 
 <pno cite="20101188">10</pno> 
 </yr>
 </cite>
 </case>
 </cited>
 </XMLDOC>
 <XMLDOC>
 <case>
 Wellesley St.
 <cite>
 <yr>
 2010 
 <pno cite="20105133">9</pno> 
 </yr>
 </cite>
 </case>
 </XMLDOC>
 </dtd>

To:

<dtd name="cited">
  <XMLDOC>
    <cited year="2010"></cited>
    <case>
      No.&nbsp;275 v. M.N.R.
    </case> 
    <cite>
    </cite>
    <yr>
      2010 
    </yr>
    <pno cite="20101188">10</pno> 
  </XMLDOC>
  <XMLDOC>
    <case>
      Wellesley St 
    </case>
    <cite>
    </cite>
    <yr>
      2010 
    </yr>
    <pno cite="20105133">9</pno> 
  </XMLDOC>
</dtd>

Thank you!

Related

sgml to xml convertion

From comments:

what i want is to apply the correct closing tags like

<yr></yr>
<pno cite="20101188">10</pno>

instead of

<yr>
2010 
<pno cite="20101188">10</pno>
</yr>
Community
  • 1
  • 1
atif
  • 1,137
  • 7
  • 22
  • 35
  • 4
    There is a general misunderstanding here. You cannot receive the output you posted from the input just using indenting. Your question and its title don't match. – khachik Dec 15 '10 at 20:48

3 Answers3

22

Use a simple identity transformation with indent="yes specified on the <xsl:output> declaration:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

This transformation, when applied on the provided XML document (the undefined entity &nbsp; replaced by its corresponding character entity &#xA0;):

 <dtd name="cited">
 <XMLDOC>
 <cited year="2010">
 <case>
 No.&#xA0;275 v. M.N.R.
 <cite>
 <yr>
 2010
 <pno cite="20101188">10</pno>
 </yr>
 </cite>
 </case>
 </cited>
 </XMLDOC>
 <XMLDOC>
 <case>
 Wellesley St.
 <cite>
 <yr>
 2010
 <pno cite="20105133">9</pno>
 </yr>
 </cite>
 </case>
 </XMLDOC>
 </dtd>

produces, when run with AltovaXML:

<dtd name="cited">
    <XMLDOC>
        <cited year="2010">
            <case>
 No. 275 v. M.N.R.
 <cite>
                    <yr>
 2010
 <pno cite="20101188">10</pno></yr>
                </cite></case>
        </cited>
    </XMLDOC>
    <XMLDOC>
        <case>
 Wellesley St.
 <cite>
                <yr>
 2010
 <pno cite="20105133">9</pno></yr>
            </cite></case>
    </XMLDOC>
</dtd>

The same transformation, when run with Saxon 6.5.4 produces:

<dtd name="cited">

   <XMLDOC>

      <cited year="2010">

         <case>
 No. 275 v. M.N.R.
 <cite>

               <yr>
 2010
 <pno cite="20101188">10</pno>

               </yr>

            </cite>

         </case>

      </cited>

   </XMLDOC>

   <XMLDOC>

      <case>
 Wellesley St.
 <cite>

            <yr>
 2010
 <pno cite="20105133">9</pno>

            </yr>

         </cite>

      </case>

   </XMLDOC>

</dtd>

So, the output is largely different, depending which XSLT 1.0 processor is used. Saxon parses and does not discard every whitespace-only node and this plus the indentation produces too much white space.

The workaround is to explicitly cause stripping of the whitespace-only nodes using:

<xsl:strip-space elements="*"/>

So, when this transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

is run with Saxon against the same source XML document, the output is now:

<dtd name="cited">
   <XMLDOC>
      <cited year="2010">
         <case>
 No. 275 v. M.N.R.
 <cite>
               <yr>
 2010
 <pno cite="20101188">10</pno>
               </yr>
            </cite>
         </case>
      </cited>
   </XMLDOC>
   <XMLDOC>
      <case>
 Wellesley St.
 <cite>
            <yr>
 2010
 <pno cite="20105133">9</pno>
            </yr>
         </cite>
      </case>
   </XMLDOC>
</dtd>

AltovaXML and a number of other XSLT 1.0 processors (.NET's XslCompiledTransform, XslTransform) also produces nice indented output running the last transformation.

UPDATE:

Just recently in his comments, the OP leaked out important new requirement, which makes this problem completely not just "indentation"...

From comments:

what i want is to apply the correct closing tags like

<yr></yr>  
<pno cite="20101188">10</pno>  

instead of

<yr>  
2010   
<pno cite="20101188">10</pno>  
</yr>

Here is the transformation, that produces the wanted output:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="yr">
  <yr>
    <xsl:apply-templates select="text()[1]"/>
  </yr>
  <xsl:apply-templates select="*"/>
 </xsl:template>
</xsl:stylesheet>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Dimitre, What i want is to apply the correct closing tags like 10 instead of 2010 10 Thanks for your help. – atif Dec 15 '10 at 21:36
  • 1
    @atif: Then the title of your question is absolutely misleading. Please, edit your question to state this important information. What you want to get as output cannot be done simply by indenting! – Dimitre Novatchev Dec 15 '10 at 22:00
  • 1
    @atif: I have updated my answer to solve your latest problem. In the future, please, formulate better your questions and their titles. – Dimitre Novatchev Dec 15 '10 at 22:09
2
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- output xml and indent -->
<xsl:output method="xml" indent="yes"/>
<!-- copy all elements and their attributes -->
<xsl:template match="* | @*">
<xsl:copy><xsl:copy-of select="@*"/><xsl:apply-templates/></xsl:copy>
</xsl:template>
</xsl:stylesheet>
dacracot
  • 22,002
  • 26
  • 104
  • 152
  • You stylesheet doesn't match PIs, comments. See [Identity transformation](http://en.wikipedia.org/wiki/Identity_transform). – khachik Dec 15 '10 at 20:50
  • Dacracot, you just applied identity template, what i want is to apply the correct closing tags like 10 instead of 2010 10 Thanks – atif Dec 15 '10 at 20:50
  • 1
    @atif: This comment should be added to the question. –  Dec 15 '10 at 21:24
  • @atif, he didn't just apply the identity template, he copied the input to the output while applying indentation. Which is what you initially asked for. – LarsH Dec 15 '10 at 23:58
0

In Java world Apache Xalan can help. All you need is adding indent, xslt:indent-amount, xmlns:xslt arguments to the xsl:output tag.

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="xml" encoding="utf-8" indent="yes"
              xslt:indent-amount="3" xmlns:xslt="http://xml.apache.org/xslt" />
...

And you can run the XSL with Ant:

<?xml version="1.0"?>
<project name="My XSL conversion" default="myxsltarget" basedir=".">
  <target name="myxsltarget">
    <xslt basedir="in" destdir="out" extension=".xml" style="myxsl.xsl"/>
  </target>
</project>
Donato Szilagyi
  • 4,279
  • 4
  • 36
  • 53