18

I have an xml with a lot of unused namespaces, like this:

<?xml version="1.0" encoding="UTF-8"?>
<ns1:Envelope xmlns:ns1="http://www.a.com" xmlns:ns2="http://www.b.com" xmlns:ns3="http://www.c.com" xmlns:ns4="http://www.d.com">
    <ns1:Body>
        <ns2:a>
            <ns2:b>data1</ns2:b>
            <ns2:c>data2</ns2:c>
        </ns2:a>
    </ns1:Body>
</ns1:Envelope> 

I would like to remove the unused namespaces without having to specify in the xslt which ones to remove/maintain. The result xml should be this:

<?xml version="1.0" encoding="UTF-8"?>
<ns1:Envelope xmlns:ns1="http://www.a.com" xmlns:ns2="http://www.b.com">
    <ns1:Body>
        <ns2:a>
            <ns2:b>data1</ns2:b>
            <ns2:c>data2</ns2:c>
        </ns2:a>
    </ns1:Body>
</ns1:Envelope> 

I've googled a lot but haven't found a solution to this particular issue. Is there any?

Thanks.

PS: Not 100% sure but I think it should be for XSL 1.0.

mdiez
  • 183
  • 1
  • 1
  • 5
  • In scope namespace URI not being part of any QName is not the same as not used. One can think in schema definitions, i.e. –  Jan 04 '11 at 18:00

3 Answers3

23

Unlike the answer of @Martin-Honnen, this solution produces exactly the desired result -- the necessary namespace nodes remain where they are and are not moved down.

Also, this solution correctly deals with attributes that are in a namespace:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*" priority="-2">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="*">
  <xsl:element name="{name()}" namespace="{namespace-uri()}">
   <xsl:variable name="vtheElem" select="."/>

   <xsl:for-each select="namespace::*">
     <xsl:variable name="vPrefix" select="name()"/>

     <xsl:if test=
      "$vtheElem/descendant::*
              [(namespace-uri()=current()
             and 
              substring-before(name(),':') = $vPrefix)
             or
              @*[substring-before(name(),':') = $vPrefix]
              ]
      ">
      <xsl:copy-of select="."/>
     </xsl:if>
   </xsl:for-each>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the following XML document (the provided XML document with an added namespaced attribute):

<ns1:Envelope xmlns:ns1="http://www.a.com" xmlns:ns2="http://www.b.com" xmlns:ns3="http://www.c.com" xmlns:ns4="http://www.d.com">
    <ns1:Body ns2:x="1">
        <ns2:a>
            <ns2:b>data1</ns2:b>
            <ns2:c>data2</ns2:c>
        </ns2:a>
    </ns1:Body>
</ns1:Envelope>

the desired, correct result is produced:

<ns1:Envelope xmlns:ns1="http://www.a.com" xmlns:ns2="http://www.b.com">
   <ns1:Body ns2:x="1">
      <ns2:a>
         <ns2:b>data1</ns2:b>
         <ns2:c>data2</ns2:c>
      </ns2:a>
   </ns1:Body>
</ns1:Envelope>
Leviathan
  • 2,468
  • 1
  • 18
  • 24
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • @mdiez: There is a problem with namespaces... Some implementations don't handle XPath `namespace` axe. –  Jan 04 '11 at 17:51
  • To expand on the comment of Implementations unaware of the namespace axis: This will result in namespaces being "pushed down" to each element that uses the namespace. This, f.e., applies to the default TransformerFactory of Java, whereas the Saxon implementation handles this correctly. – Leviathan Nov 10 '17 at 15:50
  • If you like to preserve namespaces that only occur in attribute *values* and therefore are not syntactically necessary, see @Gentil's answer below. – Leviathan Nov 10 '17 at 16:07
  • @Leviathan, I wouldn't recommend trying to guess whether the string value of an attribute is a QName or just happens to be a syntactically-valid QName -- in the general case this is just guessing. One could use schema information, if it is known that the XML document is an instance of a given schema. – Dimitre Novatchev Nov 10 '17 at 16:32
  • I see your point, but in a situation where you just want to strip as many namespaces as possible without side effects this approach should be a fair compromise, since any errors will result in having a harmless superfluous namespace. If you do not look for namespaces in attribute values, though, you may end up with a namespace removed that was actually necessary. This obviously depends on the kind of xml you are parsing. – Leviathan Nov 10 '17 at 17:06
  • This solution unfortunately does not work when having a default namespace applied. The result will have a namespace declaration on every tag – Xyaren Jun 29 '20 at 19:25
  • @Xyaren, Why do you think this is a problem? The task is to remove the **unused** namespaces. The default namespace is used -- everywhere -- so it is OK if it is not removed. Actually, it is not possible to remove **used** namespaces (that is namespaces for which in the document there are elements in them), because this will result in modifying the document. – Dimitre Novatchev Jun 29 '20 at 20:54
  • Nevermind, it was a bug in the XML processor of my Java version. Updating to Saxon fixed the issue. – Xyaren Jun 30 '20 at 21:12
  • Update: I just got in contact with this bug again, and it seems that this transformation does not work with apache-xalan, which is the default java transformation library included in the sdk. Installing the Saxon-HE dependency (https://mvnrepository.com/artifact/net.sf.saxon/Saxon-HE) will produce the desired results. – Xyaren Mar 10 '21 at 12:36
  • 1
    @Xyaren This means that apache-xalan is buggy. The transformation in this answer is standard (no extensions) XSLT 1.0 and should produce the same results with any compliant XSLT 1.0 processor. – Dimitre Novatchev Mar 10 '21 at 15:46
4

Well if you use

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

  <xsl:template match="@* | text() | comment() | processing-instruction()">
    <xsl:copy/>
  </xsl:template>

  <xsl:template match="*">
    <xsl:element name="{name()}" namespace="{namespace-uri()}">
      <xsl:apply-templates select="@* | node()"/>
    </xsl:element>
  </xsl:template>

</xsl:stylesheet>

then unused namespaces are removed but the result is more likely to look like

<ns1:Envelope xmlns:ns1="http://www.a.com">
    <ns1:Body>
        <ns2:a xmlns:ns2="http://www.b.com">
            <ns2:b>data1</ns2:b>
            <ns2:c>data2</ns2:c>
        </ns2:a>
    </ns1:Body>
</ns1:Envelope>

than what you asked for.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • 1
    +1 Also good answer without the "not always implemented" `namespace` axe. But it could end up with a lot of "duplicated" namespace declarations for elements being the first under the namespace URI for each branch. –  Jan 04 '11 at 17:55
  • Perfectly good answer. It's semantically equivalent; the location of the namespace declarations doesn't matter. – james.garriss Jul 19 '12 at 19:31
1

Adding to Dimitre's answer, if those namespaces should be preserved that only occur in attribute values, add this condition: @*[contains(.,concat($vPrefix,':'))]:

  <xsl:if test= "$vtheElem/descendant::* [namespace-uri() = current()     and
                   substring-before(name(),':') = $vPrefix or
                   @*[substring-before(name(),':') = $vPrefix] or
                   @*[contains(.,concat($vPrefix,':'))]
                  ]">

This will correctly preserve the namespace ns3 because of attrib="ns3:Header" as in the following example.

 <ns1:Envelope xmlns:ns1="http://www.a.com" xmlns:ns2="http://www.b.com" xmlns:ns3="http://www.c.com" xmlns:ns4="http://www.d.com">
    <ns1:Body ns2:x="1">
        <ns2:a>
            <ns2:b atrib="ns3:Header">data1</ns2:b>
            <ns2:c>data2</ns2:c>
        </ns2:a>
    </ns1:Body>
</ns1:Envelope>
Leviathan
  • 2,468
  • 1
  • 18
  • 24