0

I am trying to load xml content using XmlTextReader but for some reason, XmlTextReader is ignoring DtdProcessing flag while processing Xml. DtdProcessing flag is working fine if I use XmlReader instead. Problem with XmlReader is that it automatically normalize \r\n\ to \n which I don't want in my output string.

Here is my code snippet:

XmlDocument xmlDocument = new XmlDocument();

string contents = @"<?xml version='1.0' encoding='ISO-8859-1' standalone='yes'?>
    <!DOCTYPE content [<!ENTITY ouml '&#246;'>]>
    <content>Test &ouml; Test

    Test</content>";

byte[] byteArray = Encoding.UTF8.GetBytes(contents);
MemoryStream stream = new MemoryStream(byteArray);

//XmlReaderSettings settings = new XmlReaderSettings();
//settings.DtdProcessing = DtdProcessing.Parse;
//settings.IgnoreWhitespace = false;
//XmlReader reader = XmlReader.Create(stream, settings);
//xmlDocument.Load(reader);

XmlTextReader reader = new XmlTextReader(stream);
reader.DtdProcessing = DtdProcessing.Parse;
xmlDocument.Load(reader);

Console.WriteLine(xmlDocument.OuterXml);

Output I am getting from above processing:

"<?xml version=\"1.0\" encoding=\"ISO-8859-1\" standalone=\"yes\"?><!DOCTYPE content[<!ENTITY ouml '&#246;'>]><content>Test &ouml; Test\r\n\r\n    Test</content>"

Instead I want output string with the DTD processed:

"<?xml version=\"1.0\" encoding=\"ISO-8859-1\" standalone=\"yes\"?><!DOCTYPE content[<!ENTITY ouml '&#246;'>]><content>Test ö Test\r\n\r\n    Test</content>"
Firoz Ansari
  • 2,505
  • 1
  • 23
  • 36
  • Why do you care how the serialized XML looks like? There's no difference between a literal `ö` and `ö`, as long as `ouml` is defined. – Tomalak Dec 07 '16 at 19:36
  • Thank you Tomalak. I do want literal `ö` in my output instead of `ö`. I appreciate any pointer. – Firoz Ansari Dec 07 '16 at 19:39
  • I understand what you want. I was asking *why*, because the two variants are equivalent. – Tomalak Dec 07 '16 at 19:43
  • We have a legacy down stream system which is expecting literal variant. We were able to get it through XmlReader object but it has some other issues. – Firoz Ansari Dec 07 '16 at 19:48
  • 1
    I was already suspecting something horrible like *"we have a downstream application that uses regex to parse XML"*. Well... that's why the only thing that should interact with XML is an XML parser. I am not sure that there are any regular means to resolve this, but as a method of last resort you can do something equally horrible, like *"string-replace all occurrences of `ö` with `ö` in the final XML string"*. – Tomalak Dec 07 '16 at 19:56
  • Why are you using a TextReader instead of a TextWriter? – jdweng Dec 07 '16 at 20:20
  • @jdweng I am not aware of using TextWriter to do DTD processing. I will be thankful if you can provide me the sample code using TextWriter. – Firoz Ansari Dec 07 '16 at 21:15
  • If TextReader does what you want except for normalizing newlines, then it might be simplest to run TextReader and filter it through sed or some equivalent, to inject \r before every \n. – C. M. Sperberg-McQueen Dec 08 '16 at 23:06

1 Answers1

0

Code would look something like this

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Xml;
using System.Xml.Linq;
using System.IO;


namespace ConsoleApplication31
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            XmlDocument xmlDocument = new XmlDocument();
            try
            {
                string contents = @"<?xml version='1.0' encoding='ISO-8859-1' standalone='yes'?>
                <!DOCTYPE content [<!ENTITY ouml '&#246;'>]>
                <content>Test &ouml; Test
                Test</content>";

                MemoryStream stream = new MemoryStream();
                XmlTextWriter writer = new XmlTextWriter(stream, Encoding.GetEncoding("ISO-8859-1"));
                writer.WriteString(contents);
                writer.Flush();

                byte[] bytes = new byte[stream.Length];
                stream.Position = 0;
                stream.Read(bytes, 0, (int)stream.Length);
                Console.WriteLine(Encoding.GetEncoding("ISO-8859-1").GetString(bytes));
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
            }



        }
    }

}
jdweng
  • 33,250
  • 2
  • 15
  • 20