4

I want to detect the encoding of a XML document before parsing it. So I found on stackoverflow this script.

public static XElement GetXMLFromStream(Stream uploadStream)
{
    /** Remember position */
    var position = uploadStream.Position;

    /** Get encoding */
    var xmlReader = new XmlTextReader(uploadStream);
    xmlReader.MoveToContent();

    /** Move to remembered position */
    uploadStream.Seek(position, SeekOrigin.Begin); // with "pos" = 0 it not works, too
    uploadStream.Seek(position, SeekOrigin.Current); // if I remove this I have the same issue!

    /** Read content with detected encoding */
    var streamReader = new StreamReader(uploadStream, xmlReader.Encoding);
    var streamReaderString = streamReader.ReadToEnd();
    return XElement.Parse(streamReaderString);
}

But it doesn't work. Always I get EndOfStream true. But it isn't!!!! -.-

For example I have the string <test></test>. Begin: 0, End: 13

If I ReadToEnd or MoveToContent then the end is reached successfully. The EndOfStream is true then.

If I reset the position via Seek or Position to 0 (for example) then a new StreamReader shows always EndOfStream is true.

The thing is that the uploadStream is a stream which I can not close.

It's a SharpZipLib stream of a http upload stream. So I can't close this stream. I can only working with it.

And the bad thing is only because Position and Seek not work... Only because ReadToEnd relays on this Position. - Else it would work. I think!

Maybe you can help my with this situation :-)

Thank you very much in Advance!

Example: Example of <code>EndOfStream</code> is true - but the Position is not at the end!

Patrick
  • 829
  • 2
  • 13
  • 34
  • The zip stream surely obfuscates the problem. A network stream does not support seeking and if you read bytes from it then they are irretrievably lost. Unless you buffer them yourself with a MemoryStream. That can take a lot of memory, you may have to write it to a temp file. – Hans Passant Jun 18 '16 at 12:24

2 Answers2

2

This approach is fundamentally incompatible with some types of input streams. Streams are not required to support Seek at all. In fact, Stream has a property specifically to detect whether Seek is usable, called CanSeek. Code needs to take into account that Seek can fail.

The simple but not very memory-efficient way is to copy your stream's content into a MemoryStream. That one does support Seek, and you can then do whatever you want with it. The fact that you're using ReadToEnd() suggests that the data is not so large that the memory use is going to cause a problem, so you can probably just go with this.

Note: as documented, if Seek is not supported, it's supposed to throw a NotSupportedException. It looks like with the stream implementation you're dealing with, it's not supported, but not properly implemented. I hope at least that CanSeek returns false for you, so you can still reliably detect this.

  • `CanSeek` is in all cases true. And I got no exception. That's why I ask you here. Maybe it's a bug with *SharpZipLib*? – Patrick Jun 18 '16 at 11:48
  • 1
    @Patrick Yeah, that's what I was worried about and why I included my note. That does look like a bug in the implementation, but the workaround is the same: copy everything to a `MemoryStream` and it should work just fine. –  Jun 18 '16 at 12:34
  • Thank you very much @hvd I found out the same and et voilà - it work's! – Patrick Jun 19 '16 at 16:11
1

Option 1:

XElement has a Load() method that will read directly from an xml stream. It will manange the encoding for you internally. And it'll be more efficient by avoid a needless string. So why not use this.

XElement.Load(uploadStream);

Option 2:

If you really want to work with a string, dont use new XmlTextReader(). The XmlTextReader.Create() has more features so do this instead:

var xmlReader = XmlTextReader.Create(uploadStream);
var streamReaderString = xmlReader.ReadOuterXml();
return XElement.Parse(streamReaderString);
Phil Blackburn
  • 1,047
  • 1
  • 8
  • 13
  • 1
    Good call, the OP does state "I want to detect the encoding of a XML document before parsing it." but doesn't explain why and doesn't use the encoding except for parsing. –  Jun 18 '16 at 12:35