How do I test for special characters using Schematron tests?

Question

I am trying to set up a schematron test for validating special characters in XML...

More specifically, I would like to throw a warning where there is an occurrence of the copyright symbol (Unicode U+00A9).

It seems that schematron xml files cannot be parsed when using any of the following notation for the rules...

<iso:rule context="myelement>
   <iso:report test="matches(., '\u00A9')">{ES1037} Copyright Symbol Detected</iso:report>
</iso:rule> 

<iso:rule context="myelement>
   <iso:report test="matches(., '\u{00A9}')">{ES1037} Copyright Symbol Detected</iso:report>
</iso:rule> 

<iso:rule context="myelement>
   <iso:report test="matches(., '\u{A9}')">{ES1037} Copyright Symbol Detected</iso:report>
</iso:rule> 

<iso:rule context="myelement>
   <iso:report test="matches(., '\x{00A9}')">{ES1037} Copyright Symbol Detected</iso:report>
</iso:rule>

Any schematron experts out there that know how to accomplish embedding a unicode character into a regex?

Thanks in advance...

score 1 · Accepted Answer · answered Feb 03 '13 at 14:10

You need to write the code as character entity like it is used for the XML Schema standard:

<?xml version="1.0" encoding="UTF-8"?>
<iso:schema xmlns:iso="http://purl.oclc.org/dsdl/schematron">
    <iso:pattern id="unicode in regex">
        <iso:rule context="a">
            <iso:report test="matches(., '&#xa9;')">
                Copyright found
            </iso:report>
        </iso:rule>
    </iso:pattern>
</iso:schema>

Output in XML ValidatorBuddy

How do I test for special characters using Schematron tests?

1 Answers1