2

As XML content in an HTTP POST request, I receive the following which I process in Xquery 3.1 (eXist-db 5.2):

<request id="foo">
     <p>The is a description with a line break&lt;br/&gt;and another linebreak&lt;br/&gt;and
            here is an ampersand&amp;.</p>
<request>

My objective is to take the node <p> and insert it into a TEI file in eXist-db. If I just insert the fragment as-is, no errors are thrown.

However I need to transform any instances of string &lt;br/&gt; into element <lb/> before adding it to the TEI document. I try that with fn:parse-xml.

Applying the following, however, throws an error on &amp...which surprises me:

let $xml := <request id="foo">
                 <p>The is a description with a line break&lt;br/&gt;and 
                    another linebreak&lt;br/&gt;and here is an ampersand&amp;.</p>
           <request>
let $newxml := <p>{replace($xml//p/text(),"&lt;br/&gt;","&lt;lb/&gt;")}</p>
return <p>{fn:parse-xml($newxml)}</p>

error:

Description: err:FODC0006 String passed to fn:parse-xml is not a well-formed XML document.: Document is not valid.
Fatal : The entity name must immediately follow the '&' in the entity reference.

If I remove &amp; the fragment parses just fine. Why is this producing an error if it is legal XML? How can I achieve the needed result?

Many thanks in advance.

ps. I am open to both Xquery and XSLT solutions.

jbrehr
  • 775
  • 6
  • 19

1 Answers1

2

It seems that the issue is the HTML entities. It would work with numeric entities (i.e. &#60; instead of &lt; and &#62; instead of &gt;), but the XML parser doesn't know about HTML character entities.

Useutil:parse-html() instead of fn:parse-xml().

let $xml := <request id="foo">
                  <p>The is a description with a line break&lt;br/&gt;and 
                    another linebreak&lt;br/&gt;and here is an ampersand&amp;.</p>
           </request>
return <p>{util:parse-html($xml/p/text())/HTML/BODY/node()}</p>
Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
  • This has the merit of working, although it wraps everything in HTML body, etc. But from there I can transform to final TEI XML at least! – jbrehr Jul 06 '20 at 13:38
  • Right, you can XPath into it to select just the markup. I'll update the answer. – Mads Hansen Jul 06 '20 at 17:03