0

i have following code:

public class TestStreamReader : StreamReader
    {
.
.
        public override int Read([In, Out] char[] buffer, int index, int count)
        {
            char[] charBuffer = new char[buffer.Length];
            int i = base.Read(charBuffer, index, count);
            string s = new string(charBuffer);
            s = s.CleanInvalidXmlChars();
            Buffer.BlockCopy(s.ToCharArray(), index, buffer, index, count);

            return i;
        }
}

But if I make following call:

XmlReaderSettings settings = new XmlReaderSettings
                {
                    DtdProcessing = DtdProcessing.Ignore
                };
using ( DataSet ds = new DataSet() ) {
    using ( TestStreamReader stream = new TestStreamReader(fileName) ) {
        using ( XmlReader tr = XmlReader.Create(stream, settings) ) {
            ds.ReadXml(tr);
            ImportDataSet(ds);
        }
    }
}

    public static string CleanInvalidXmlChars(this string input)
    {
        if ( string.IsNullOrWhiteSpace(input) ) {
            return input;
        }           
        return input.Replace(" ", " ");
    }

I get an exception:

The 'Description' start tag on line 53 position 6 does not match the end tag of 'Descrip'. Line 53, position 156. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.ThrowTagMismatch(NodeData startTag) at System.Xml.XmlTextReaderImpl.ParseEndElement() at System.Xml.XmlTextReaderImpl.ParseElementContent()

The reason for the exception is that the Read is called only one time at the beginning and never more for loading next chunks of data.

Can anybody explain why does this happen?

  • ...and what does the exception say...? – mortb Feb 16 '17 at 14:05
  • as I mentioned the function Read was called only once! So I get only a part out of the file. The exception is:The 'Description' start tag on line 53 position 6 does not match the end tag of 'Descrip'. Line 53, position 156. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.ThrowTagMismatch(NodeData startTag) at System.Xml.XmlTextReaderImpl.ParseEndElement() at System.Xml.XmlTextReaderImpl.ParseElementContent() – I. Bespalov Feb 16 '17 at 14:23
  • 1
    That exception clearly indicates your XML isn't valid. I'm guessing you need to fix your `CleanInvalidXmlChars` method, which you haven't shared with us. –  Feb 16 '17 at 14:30
  • What are you trying to achieve? Trying to remove invalid XML chars in the `StreamReader` doesn't seem to be a good idea... – Thomas Levesque Feb 16 '17 at 14:33
  • More importantly, if `CleanInvalidXmlCharacters` can *change the length* of the data that you're working with, the `i` value you return is not an accurate reflection of the amount of data you're actually providing. – Damien_The_Unbeliever Feb 16 '17 at 14:35
  • for now I have public static string CleanInvalidXmlChars(this string input) { if ( string.IsNullOrWhiteSpace(input) ) { return input; } return input.Replace(" ", " "); } What can I fix here? – I. Bespalov Feb 16 '17 at 14:36
  • 1
    Edit your question and add the code there. Don't put code in comments. –  Feb 16 '17 at 14:36
  • Thomas Levesque: what you can suggest to do? Read all 1-2 GB into memory as string, make replacements, create new stream and then use? – I. Bespalov Feb 16 '17 at 14:39
  • You need to debug this. The unsafest assumptions that you'd need to check are - `new string(charBuffer);` converts the entire buffer but the only parts of the buffer that we know are good are those between `index` and `index + i - 1`. That is then passed to the apparently pointless `CleanInvalidXmlChars` function, then converted back to a `char[]` array in which you assume all the data effectively ends up back in the same locations (if so, what was the point of this sequence?). You then overwrite an entire `count` worth of bytes in the original buffer. – Damien_The_Unbeliever Feb 16 '17 at 14:57
  • Damien_The_Unbeliever: good point, I will investigate this. Thank you – I. Bespalov Feb 16 '17 at 15:04
  • @I.Bespalov - You could [answer your own question](http://stackoverflow.com/help/self-answer) so that others will know your question has been answered, and what the answer is. – dbc Feb 16 '17 at 22:58

1 Answers1

0

Buffer.BlockCopy works with BYTES, so if copying array with chars the amount of bytes need to be multiplied by 2 or sizeof(char)

My thanks goes to Damien_The_Unbeliever for right hint:

Final code:

public override int Read([In, Out] char[] buffer, int index, int count)
{
    char[] charBuffer = new char[buffer.Length];
    int i = base.Read(charBuffer, index, count);
    string s = new string(charBuffer);
    s = s.CleanInvalidXmlChars();
    char[] tempBuffer = s.ToCharArray();
    int sizeOfChar = sizeof(char);
    Buffer.BlockCopy(tempBuffer, index*sizeOfChar, buffer, index*sizeOfChar, count*sizeOfChar);

    return i;
}