5

Is there a way to prevent .NET's XmlReader class from expanding XML entities into their value when reading the content?

For instance, suppose the following XML is used as input:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE author PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN//XML" "http://www.oasis-open.org/docbook/xmlcharent/0.3/iso-lat1.ent" >
<author>&aacute;</author>

Let's assume it is not possible to reach the external OASIS DTD needed for the expansion of the aacute entity. I would like the reader to read, in sequence, the author element, then the aacute node of type EntityReference, and finally the author end element, without throwing any errors. How can I achieve this?

UPDATE: I also want to prevent the expansion of character entities such as &#x00E1;.

Gabriel S.
  • 1,347
  • 11
  • 31

2 Answers2

1

One way to do that is use `XmlTextReader', like this:

using (var reader = new XmlTextReader(@"your url"))
{
    // note this
    reader.EntityHandling = EntityHandling.ExpandCharEntities;
    while (reader.Read())
    {
        // here it will be EntityReference with no exceptions
    }
}

If that is not an option - you can do the same with XmlReader, but some reflection will be required (at least I don't aware of another way):

using (var reader = XmlReader.Create(@"your url", new XmlReaderSettings() {
    DtdProcessing = DtdProcessing.Ignore // or Parse
})) {
     // get internal property which has the same function as above in XmlTextReader
     reader.GetType().GetProperty("EntityHandling", BindingFlags.Instance | BindingFlags.NonPublic).SetValue(reader, EntityHandling.ExpandCharEntities);
     while (reader.Read()) {
          // here it will be EntityReference with no exceptions
     }
 }
Evk
  • 98,527
  • 8
  • 141
  • 191
  • 1
    This is quite close to what I'm looking for, but I'd also like to prevent expansion of char entities as well. The values of the EntityHandling enum do not allow this case too. – Gabriel S. Mar 13 '17 at 14:29
  • @GabrielS I understand that you might not want to expand entities, but why not expand char entities? – Evk Mar 13 '17 at 14:51
  • The application I'm working on needs to take control over the expansion of all XML entities, as it performs some custom processing for each of them. That's why it needs to receive unexpanded entity references from `XmlReader` instead of the expanded result. – Gabriel S. Mar 13 '17 at 15:03
  • I'm not sure that is possible with XmlReader or XmlTextReader. Maybe if create custom XmlReader and delegate most functionality to default one... – Evk Mar 13 '17 at 15:35
1

XML parsing is dangerous. In some cases it allows to CVEs and Denial-of-Service attacks.

For example CVE-2016-3255

Also it was disscussed on Black Hat EU 2013

The most interested document is MLDTDEntityAttacks that provides Implementations and Recomendations for developers.

Retrieve resources:

<!DOCTYPE roottag [
 <!ENTITY windowsfile SYSTEM "file:///c:/boot.ini">
]>
<roottag>
 <sometag>&windowsfile;</sometag>
</roottag>

DoS:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE root
  [
  <!ENTITY a0 "test" >
  <!ENTITY a1 "&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;">
  <!ENTITY a2 "&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;">
  <!ENTITY a3 "&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;">
  <!ENTITY a4 "&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;">
  ]>
<root>&a4;</root>

Back to your question.
As @Evk wrote: By setting EntityHandling you can prevent from expanding all entities except CharEntities.

I dont know solution to prevent expand CharEntity except your own XmlReader implementation.

I think you also want prevent parsing &amp; &apos; &lt; &gt; &quot;

FYI how and where XmlTextReader parses CharEntity

XmlTextReader
ParseElementContent
& case
ParseText
Char entity case
ParseCharRefInline

This function finally parses numeric character entity reference (e.g. &#32; and &#x00E1;)
ParseNumericCharRefInline


This function parses named character entity reference (&amp; &apos; &lt; &gt; &quot;)
ParseNamedCharRef

galakt
  • 1,374
  • 13
  • 22
  • Although your links to the exact pieces of code handling entities provide helpful insights, rewriting the `XmlTextReaderImpl` is not really an option for me. Neither do I see any possibility to inherit and override this specific portion from this class, especially since it's not even public. The only option I see right now is pre-processing the XML to change the entities into some special text that is no longer seen as entity references by the `XmlReader`, but which I can then process unexpanded. – Gabriel S. Mar 25 '17 at 22:33