0

I am looking for a solution to delete all duplicates from an XML file not based on Exact node name, Instead, I am looking for a solution that can identify all the duplicate nodes and delete them. Only the first node should exist, and the rest of the duplicate nodes to be deleted.

I read couple of similar posts:

XSL - remove the duplicate node but keep the original

Removing duplicate elements with XSLT

Example:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<projects>
        <project id="staticproperties">
            <property name="prop1">removing this prop if its duplicate</property>       
            <property name="prop2">removing this prop if its duplicate</property>               
            <property name="prop3">removing this prop if its duplicate</property>       
            <property name="prop4">removing this prop if its duplicate</property>   
            </project>
        <project id="febrelease2013">
            <property name="prop">testing prop from pom.xml</property>
            <property name="prop1">removing this prop if its duplicate</property>   
            <property name="prop3">removing this prop if its duplicate</property>       
            <property name="prop1">removing this prop if its duplicate</property>               
            <property name="prop5">removing this prop if its duplicate</property>   
        </project>
</projects>

NOTE: <property name="**could be any thing**">

Desired Output is:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<projects>
        <project id="staticproperties">
            <property name="prop1">removing this prop if its duplicate</property>       
            <property name="prop2">removing this prop if its duplicate</property>               
            <property name="prop3">removing this prop if its duplicate</property>       
            <property name="prop4">removing this prop if its duplicate</property>   
            </project>
        <project id="febrelease2013">
            <property name="prop">testing prop from pom.xml</property>      
            <property name="prop5">removing this prop if its duplicate</property>   
        </project>
</projects>
Community
  • 1
  • 1
kumar
  • 389
  • 1
  • 9
  • 28

1 Answers1

0

One way to do this via XSLT 1.0 is by utilizing the Muenchian Grouping methodology to output only unique <property> elements (based on their @name attribute).

For example, when this XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output omit-xml-declaration="no" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:key name="kPropertyByName" match="property" use="@name"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template
    match="property[
             not(
               generate-id() =
               generate-id(key('kPropertyByName', @name)[1])
             )
           ]"/>

</xsl:stylesheet>

...is applied against the provided XML, the wanted result is produced:

<?xml version="1.0" encoding="UTF-8"?>
<projects>
  <project id="staticproperties">
    <property name="prop1">removing this prop if its duplicate</property>
    <property name="prop2">removing this prop if its duplicate</property>
    <property name="prop3">removing this prop if its duplicate</property>
    <property name="prop4">removing this prop if its duplicate</property>
  </project>
  <project id="febrelease2013">
    <property name="prop">testing prop from pom.xml</property>
    <property name="prop5">removing this prop if its duplicate</property>
  </project>
</projects>
ABach
  • 3,743
  • 5
  • 25
  • 33