5

Oracle 11g. I figured out that if I add NOENTITYESCAPING to the XMLELEMENT function, it nicely turns off entity escaping. However, when I then pass the result to EXTRACT the escaping seems to come back again.

select xmlelement(NOENTITYESCAPING e,id,'->') 
  from (select level as id 
          from dual 
       connect by level < 6)      

XMLELEMENT(NOENTITYESCAPINGE,ID,'->')
---------------------------------------
<E>1-></E>
<E>2-></E>
<E>3-></E>
<E>4-></E>
<E>5-></E>

Now, adding EXTRACT:

select xmlelement(NOENTITYESCAPING e,id,'->').extract('//text()')
  from (select level as id 
          from dual 
       connect by level < 6)

XMLELEMENT(NOENTITYESCAPINGE,ID,'->').EXTRACT('//TEXT()')
----------------------------------------------------------
1-&gt;
2-&gt;
3-&gt;
4-&gt;
5-&gt;

Any fixes/workarounds to keep the escaping switched off? The manual gives no help.

Nick Krasnov
  • 26,886
  • 6
  • 61
  • 78
TrojanName
  • 4,853
  • 5
  • 29
  • 41

1 Answers1

13

Try to use extractvalue() function, which unescapes encoded entities, instead of extract(). Here is an example:

clear screen;
column res format a20;

-- depending on a situation, NOENTITYESCAPING might be dropped

select extractvalue(
                     xmlelement(NOENTITYESCAPING e,id,'->')
                    , '//text()'
                    ) as res
  from (select level as id 
          from dual 
       connect by level < 6)

Result:

RES                
--------------------
1->                  
2->                  
3->                  
4->                  
5->    

But the use of extractvalue() function may be limited by the fact that it can return value of only one node. In a case of returning values of multiple nodes the utl_i18n package, and unescape_reference() function of that package can be used to unescape encoded entities:

clear screen;
column res format a20;

select utl_i18n.unescape_reference(xmlelement(root
                                             , xmlelement(node1, '>')
                                             , xmlelement(node2, '<')
                                             ).extract('//text()').getstringval()
                                   ) as res
 from dual
connect by level <= 3;

Result:

RES                
--------------------
><                   
><                   
>< 

Yes, as utl_i18n.unescape_reference() function accepts only values of varchar2 data type and types that can be implicitly converted to the varchar2 data type, your hands are tied when it comes to processing large "strings". In this situation you may turn to dbms_xmlgen package and convert() function in particular, which has an overloaded version capable of accepting CLOBs. Here is an example:

select dbms_xmlgen.convert(
                           xmlagg(xmlelement(root
                                             , xmlelement(node1, '>')
                                             , xmlelement(node2, '<')
                                             )
                                  ).extract('//text()').getclobval()
                          , 1) as res
 from dual
connect by level <= 3000;   -- 1 (second parameter of the convert() function)
                            -- instructs function to decode entities  

Result:

RES
------------------------------------------------------
><><><><><><><><><><><><><><><><><><><><><><><><><>
-- ... the rest of the CLOB
Nick Krasnov
  • 26,886
  • 6
  • 61
  • 78
  • Thanks Nicholas. I see you updated your answer as I was just about to comment that as I'm using XMLAGG (to work around LISTAGG's 4000 char limit), and that extractvalue won't work with that. I've tried your recommendation of utl_i18n.unescape_reference, however I think that's hitting the 4000 char limit too. – TrojanName May 16 '14 at 13:08
  • Here's my current query, which demonstrates the problem: select rtrim(xmlagg(xmlelement( e,id,'->').extract('//text()') order by id).GetClobVal(),',') from (select level as id from dual connect by level < 6) – TrojanName May 16 '14 at 13:09
  • 2
    @TrojanName `I think that's hitting the 4000 char limit too` Yes. `unescape_reference()` function accepts `varchar2` values. To process "*big*" strings use `dbms_xmlgen.convert()` function. Answer has been updated. – Nick Krasnov May 16 '14 at 13:29
  • Wow, phenomenal answer! Thank you so much. Oracle has so many ways of doing things, probably too many at times. :-) – TrojanName May 16 '14 at 14:35
  • Thanks, this was very useful to me and saved me using the undocumented NOENTITYESCAPING. XML in oracle really is a big topic! – kayakpim Jul 21 '16 at 13:00