3

I'm experimenting with XSLT2, using a stylesheet based on this answer:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>
 <xsl:template match="source/text()">
  <xsl:sequence select="replace(., '&lt;.*?&gt;', '<ph>$0</ph>')"/>
 </xsl:template>
</xsl:stylesheet>

which is intended to do multiple replacements, eg from:

<?xml version="1.0" encoding="utf-8"?>
<xliff xmlns:xliff="urn:oasis:names:tc:xliff:document:1.1" version="1.1">
  <file>
    <source>abc &lt;field1&gt; def &lt;field2&gt; ghi</source>
  </file>
</xliff>

to:

<?xml version="1.0" encoding="utf-8"?>
<xliff xmlns:xliff="urn:oasis:names:tc:xliff:document:1.1" version="1.1">
  <file>
    <source>abc <ph>&lt;field1&gt;</ph> def <ph>&lt;field2&gt;</ph> ghi</source>
  </file>
</xliff>

However my transform is not valid, I get this error:

Error on line 12 column 54 of my.xsl:
  SXXP0003: Error reported by XML parser: The value of attribute "select" associated with an
  element type "null" must not contain the '<' character.

If I use select="replace(., '&lt;(.*?)&gt;', '&lt;ph&gt;F&lt;/phgt;')" then I get ...&lt;ph&gt;... in the output.

If I use DOE I introduce other problems because there might me other entities in the field I want to leave untouched. If I use <xsl:output method="text"/> I lose most of my xml - is there some other way of 'mixing and matching' like this?

Community
  • 1
  • 1

2 Answers2

4

The problem is here:

<xsl:sequence select="replace(., '&lt;(.*?)&gt;', '<ph>F</ph>')"/>

A well-formed XML document cannot contain the < character in an attribute value.

In this particular case, the select attribute above contains the substring <ph>F</ph> and this causes the stylesheet even not to be parsed as an XML document.

And, more importantly, elements cannot be generated just by string replacement -- the result will be just string (containing encoded element representation) -- not element.

Here is how to achieve what you want:

 <xsl:template match="node()|@*">
   <xsl:copy>
     <xsl:apply-templates select="node()|@*"/>
   </xsl:copy>
 </xsl:template>

 <xsl:template match="source/text()">
  <xsl:analyze-string select="." regex="&lt;(.*?)&gt;">
    <xsl:matching-substring>
      <ph><xsl:value-of select="regex-group(1)"/></ph>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
     <xsl:sequence select="."/>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
 </xsl:template>

when this transformation is applied on the provided XML document:

<xliff xmlns:xliff="urn:oasis:names:tc:xliff:document:1.1" version="1.1">
    <file>
        <source>abc &lt;field1&gt; def &lt;field2&gt; ghi</source>
    </file>
</xliff>

the wanted result is produced:

<xliff xmlns:xliff="urn:oasis:names:tc:xliff:document:1.1" version="1.1">
      <file>
            <source>abc <ph>field1</ph> def <ph>field2</ph> ghi</source>
      </file>
</xliff>

Explanation: Appropriate use of the XSLT 2.0 instructions <xsl:analyze-string>, <xsl:matching-substring>, <xsl:non-matching-substring> and regex-group()

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
1

If the string &lt; appears in your source document, then the XDM tree representation of the document will contain the character '<' in its place, which will match the regex '<', which is written in your stylesheet as &lt;.

So it should work, but you've obviously done something wrong. Show us what you did, and we might have a chance to tell you where you went wrong. Telling us you ran into problems is not much use if you don't tell us what the problems were.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Thanks for your response Michael - I've updated the question to show exactly what I've tried. The 'search' bit with the regex works fine as you say it should - but I'm having trouble with the 'replace' side –  Jan 18 '12 at 12:50
  • 1
    The problem is that @Jack-Douglas wants to generate elements simply by string replacement -- this is not possible. Elements are nodes and must be created not as substrings of a string. The solution is to use `` -- as in my answer. – Dimitre Novatchev Jan 18 '12 at 14:02