0

I am not sure if this is possible. Is it possible, using xslt, ideally xslt 1, to go through an xml file and remove a term if it is a plural of another term. I have this:

<?xml version="1.0" encoding="utf-8"?>
<Zthes>
<term>
<termId>35518385342487469049732</termId>
<termUpdate>add</termUpdate>
<termName>Doctor</termName>
<termType>PT</termType>
<termStatus>active</termStatus>
<termApproval>candidate</termApproval>
<termCreatedDate>20121217T11:47:45</termCreatedDate>
<termCreatedBy>admin</termCreatedBy>
<termModifiedDate>20121217T11:48:39</termModifiedDate>
<termModifiedBy>admin</termModifiedBy>
</term>
<term>
<termId>19229419919329134598161</termId>
<termUpdate>add</termUpdate>
<termName>Doctors</termName>
<termType>ND</termType>
<termStatus>active</termStatus>
<termApproval>candidate</termApproval>
<termCreatedDate>20121217T11:48:14</termCreatedDate>
<termCreatedBy>admin</termCreatedBy>
<termModifiedDate>20121217T11:48:14</termModifiedDate>
<termModifiedBy>admin</termModifiedBy>
<relation>
<relationType>USE</relationType>
<termId>35518385342487469049732</termId>
<termName>Doctor</termName>
</relation>
</term>
<term>
<termId>179468269297128829432204</termId>
<termUpdate>add</termUpdate>
<termName>Medical Centre</termName>
<termType>PT</termType>
<termStatus>active</termStatus>
<termApproval>candidate</termApproval>
<termCreatedDate>20121217T11:48:31</termCreatedDate>
<termCreatedBy>admin</termCreatedBy>
<termModifiedDate>20121217T11:48:53</termModifiedDate>
<termModifiedBy>admin</termModifiedBy>
</term>
<term>
<termId>109697087683409264068424</termId>
<termUpdate>add</termUpdate>
<termName>Hospitals</termName>
<termType>ND</termType>
<termStatus>active</termStatus>
<termApproval>candidate</termApproval>
<termCreatedDate>20121217T11:48:53</termCreatedDate>
<termCreatedBy>admin</termCreatedBy>
<termModifiedDate>20121217T11:48:53</termModifiedDate>
<termModifiedBy>admin</termModifiedBy>
<relation>
<relationType>USE</relationType>
<termId>179468269297128829432204</termId>
<termName>Medical Centre</termName>
</relation>
 </term>
</Zthes>

I'd like to be able to look at the <termName> if the <termType> has status ND. Then, if it does, examine the <termName> in the <relation> section. If the only difference between them is that one of them ends in 's', delete the <term> with status ND:

<?xml version="1.0" encoding="utf-8"?>
<Zthes>
<term>
<termId>35518385342487469049732</termId>
<termUpdate>add</termUpdate>
<termName>Doctor</termName>
<termType>PT</termType>
<termStatus>active</termStatus>
<termApproval>candidate</termApproval>
<termCreatedDate>20121217T11:47:45</termCreatedDate>
<termCreatedBy>admin</termCreatedBy>
<termModifiedDate>20121217T11:48:39</termModifiedDate>
<termModifiedBy>admin</termModifiedBy>
</term>
<term>
<termId>179468269297128829432204</termId>
<termUpdate>add</termUpdate>
<termName>Medical Centre</termName>
<termType>PT</termType>
<termStatus>active</termStatus>
<termApproval>candidate</termApproval>
<termCreatedDate>20121217T11:48:31</termCreatedDate>
<termCreatedBy>admin</termCreatedBy>
<termModifiedDate>20121217T11:48:53</termModifiedDate>
<termModifiedBy>admin</termModifiedBy>
</term>
<term>
<termId>109697087683409264068424</termId>
<termUpdate>add</termUpdate>
<termName>Hospitals</termName>
<termType>ND</termType>
<termStatus>active</termStatus>
<termApproval>candidate</termApproval>
<termCreatedDate>20121217T11:48:53</termCreatedDate>
<termCreatedBy>admin</termCreatedBy>
<termModifiedDate>20121217T11:48:53</termModifiedDate>
<termModifiedBy>admin</termModifiedBy>
<relation>
<relationType>USE</relationType>
<termId>179468269297128829432204</termId>
<termName>Medical Centre</termName>
</relation>
 </term>
</Zthes>

Is XSLT the best approach for this? I am way out of my depth here to be honest. Thanks.

lobe
  • 31
  • 7
  • 1
    It's not really the purpose of XSLT since XSLT is supposed to be used to transform the content structure but not the content itself. Don't forget that the "S" of XSLT stands for "Stylesheet" – Charles-Édouard Coste Dec 17 '12 at 12:02
  • @Charles-EdouardCoste, Your statement isn't true. XSLT has been used for a wide variety of text-processing tasks -- from spelling checking to building concordances to lexers and parsers for complex programming languages. – Dimitre Novatchev Dec 17 '12 at 15:06
  • Supporting a feature doesn't mean that it was designed to... Actually, you can register any function you like in some XSLT Processors. But this doesnt't mean that you should use any kind of function for any purpose. By the way, you're actually right: XSLT (or more accuratly: XPath) provides a huge variety of text-processing functions. But would you adapt your XSLT files for each type of natural language? – Charles-Édouard Coste Dec 17 '12 at 17:34

1 Answers1

1

If the s concatenation suffices for all terms used then

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="term[termType = 'ND' and concat(relation/termName, 's') = termName]"/>

</xsl:stylesheet>

might do. It also assumes there is only a single relation/termName inside a term respectively only the first one is relevant.

If there can be several then perhaps

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="term[termType = 'ND' and relation/termName[concat(., 's') = ../../termName]]"/>

</xsl:stylesheet>

is more appropriate.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110