Using XhtmlTextWriter with XmlTextReader

Question

After reading this article, I have decided to update the following code (using XmlDocument) with XmlReader:

Rendering controls

public string Rendering(Control baseControl)
{
    StringBuilder stringBuilder = new StringBuilder();

    using (StringWriter stringWriter = new StringWriter(stringBuilder))
    using (XhtmlTextWriter htmlWriter = new XhtmlTextWriter(stringWriter))
    {
        baseControl.RenderControl(htmlWriter);

        return PretifyWithNewlines(stringBuilder.ToString());
    }
}

Adding newline after each node

private string PretifyWithNewlines(string minifiedMarkup)
{   
    XmlDocument xmlDocument = new XmlDocument();
    xmlDocument.XmlResolver = null;

    try
    {
        xmlDocument.LoadXml("<base>" + minifiedMarkup + "</base>");
    }
    catch // when minifiedMarkup contains the whole HTML with DTD tag defined, 
    {                                    // it throws an exception with <base>
        xmlDocument.LoadXml(minifiedMarkup);
    }

    return recursiveOperation(xmlDocument.ChildNodes)
           .Replace(Environment.NewLine + Environment.NewLine, Environment.NewLine)
           .Replace(Environment.NewLine + "<base>" + Environment.NewLine, "")
           .Replace(Environment.NewLine + "</base>" + Environment.NewLine, "");

}

Recursively traverse each node and plant new element

private static string recursiveOperation(XmlNodeList xmlNodeList)
{
    string result = "";

    foreach (XmlNode currentNode in xmlNodeList)
    {
        XmlNode clonedNode = currentNode;

        string interimMarkup = recursiveOperation(currentNode.ChildNodes);
        try
        {
            clonedNode.InnerXml = interimMarkup;
        }
        finally
        {
            result += Environment.NewLine + clonedNode.OuterXml + Environment.NewLine;
        }
    }
    return result;
}

Questions:

Is there a room for optimizing the existing code?
How would I go about directly instantiating XmlTextReader from Control, StringWriter or XhtmlTextWriter object? Or do I really need to render it as a string first then instantiate XmlTextReader?

Edit

As per Jon Skeet's answer, here is the update. The idea is to implode a newline after each element:

Before prettify:

<div class="tag"><span>text<span class="another-span"></span></span></div><div>Text<img src="some/relative/URL/" />namely</div>

After prettify:

 <div class="tag">
 <span>
 text
 <span class="another-span"></span>
 </span>
 </div>
 <div>
 Text
 <img src="some/relative/URL/" />
 namely
 </div>

Notice how span.another-span keep collapsed while everything else (with child nodes) expanded. The indentation will be asserted by Visual Studio.

Are you doing this purely for performance reasons? Do you have evidence that loading the XML is actually the slow part? The first thing I'd do with that code is replace the string concatenation with a `StringBuilder` (passed in to `recursiveOperation`, so there's only one of them). — Jon Skeet, Oct 24 '13 at 06:01
@JonSkeet, yes performance is the primary concern here. As mentioned in the article, `XmlTextReader` and `Linq to Xml` are the winners. I reckon `XmlDocument` is slow and kind of depreciated. — Annie, Oct 24 '13 at 06:12
So do you already have a benchmark test harness for this, so you can *measure* the performance you've got? If you haven't, that's the first thing to do, before you start changing the code. `XmlDocument` certainly isn't deprecated, although I wouldn't use it in new code unless I had to. I suspect a lot of your time will actually be spent reparsing (when you set the `InnerXml` property). — Jon Skeet, Oct 24 '13 at 06:28

Jon Skeet · Accepted Answer · 2013-10-24T06:30:53.953

2

Is there a room for optimizing the existing code?

Absolutely. The first place I'd change has nothing to do with how you load the XML - it's string concatenation. I'd change your recursiveOperation method to:

private static string RecursiveOperation(XmlNodeList xmlNodeList)
{
    StringBuilder result = new StringBuilder();

    foreach (XmlNode currentNode in xmlNodeList)
    {
        XmlNode clonedNode = currentNode;

        // Remove try/finally block - if an exception is thrown your
        // result will be lost anyway
        string interimMarkup = RecursiveOperation(currentNode.ChildNodes);
        clonedNode.InnerXml = interimMarkup;
        result.Append(Environment.NewLine)
              .Append(clonedNode.OuterXml)
              .Append(Environment.NewLine);
    }
    return result.ToString();
}

It's possible you could optimize this further using a single StringBuilder passed into RecursiveOperation, but I haven't quite got to grips with your code sufficiently to say yet. (It's before the first coffee of the morning.)

In terms of the XML handling itself, you're currently doing a lot of reparsing by setting the OuterXml node in each child (recursively). I suspect if I had a better grasp of what you were doing, it would be feasible to change the whole approach. Given that this is functionality that XmlReader really doesn't have (it wouldn't make sense), it's not clear that you should be taking much notice of the other article at the moment.

How would I go about directly instantiating XmlTextReader from Control, StringWriter or XhtmlTextWriter object? Or do I really need to render it as a string first then instantiate XmlTextReader?

It's not clear what it would even mean to create an XmlTextReader from any of those objects - they're not inherently a source of XML data. I think what you've got already looks reasonable to me.

If you're still concerned about the performance, you should avoid guesswork and use a profiler to measure where the time is being taken. You should set yourself a target first though, otherwise you won't know when you've finished optimizing.

edited Oct 24 '13 at 06:30

answered Oct 24 '13 at 06:05

Jon Skeet

1,421,763
867
9,128
9,194

@Annie: So what performance tests do you already have in place? Are you able to benchmark what improvements you actually get from any of this? – Jon Skeet Oct 24 '13 at 12:31
Currently, I am just doing stress testing in my unit test cases. Like calling control builder with random parameters to build the Control object, then rendering control to generate markup string. When building the control, certain context is unknown which prevent me planting newline craftly. That's why I am doing this *post processing*. – Annie Oct 24 '13 at 13:31
1

So rather than do this randomly, why don't you find *realistic* test cases, work out how fast you need them to be, and then test that. Then you'll know how close you are to meeting your requirements. – Jon Skeet Oct 24 '13 at 13:32
Jon, I have duly noted your points and at this point, I guess I need to wait for the user feedback. Unless there are significant performance concerns, the current approach is good to go. OAN, you implied in an earlier comment that `XmlDocument` won't be your choice in new code. I wonder, what would you use instead; `Linq to XML` or `XmlTextReader`? – Annie Oct 24 '13 at 15:00
1

If I remove try/catch, it fails some tests. I have to check for the condition `cloneNode.NodeType == XmlNodeType.Text` for each node and `continue` to rescue from this error: `Cannot set the 'InnerXml' for the current node because it is either read-only or cannot have children.`. From object-oriented design perspective, I reckon try-catch is a better construct than if-else. – Annie Oct 28 '13 at 07:13
@Annie: Why would you need to wait for *user* feedback? Why can't you determine requirements beforehand, and test against those requirements? You also shouldn't use a try/catch block to fix that error - you should check whether what you're going to try to do makes sense before you try to do it. – Jon Skeet Oct 28 '13 at 07:20
this module is a part of a sub system (a library) which would further be integrated into an actual system which is under development. So the requirements are subjective at this time and vague like "as optimized as possible". I was just trying to figure out if I can trim the executions time, be it milliseconds. Also experimenting the alternate approaches to deal with Html in .NET 4.5.1. Thanks for your great assistance! :) – Annie Oct 30 '13 at 00:59
@Annie: "as optimized is possible" simply isn't a useful requirement at all. There are *always* things that can be done to optimize further, at the cost of simplicity and engineering hours. You should *at least* perform some tests and see whether the timing seems reasonable. I would then move on to other tasks rather than chasing a neverending impossible target. – Jon Skeet Oct 30 '13 at 06:42

Using XhtmlTextWriter with XmlTextReader

Questions:

Edit

1 Answers1