7

I am working on improving a stream reader class that uses a BinaryReader. It consists of a while loop that uses .PeekChar() to check if more data exists to continue processing.

The very first operation is a .ReadInt32() which reads 4 bytes. What if PeekChar only "saw" one byte (or one bit)? This doesn't seem like a reliable way of checking for EOF.

The BinaryReader is constructed using its default parameters, which as I understand it, uses UTF8 as the default encoding. I assume that .PeekChar() checks for 8 bits but I really am not sure.

How many bits does .PeekChar() look for? (and what are some alternate methods to checking for EOF?)

JYelton
  • 35,664
  • 27
  • 132
  • 191
  • If you have 1 byte left in the stream and do a `ReadInt32`, I'd say that you have an error condition and not an EOF condition... – Lucero Aug 24 '11 at 20:52
  • If the return type is `System.Int32`, wouldn't it read 32 bits? – qJake Aug 24 '11 at 20:53
  • @Lucero I agree; in such a case the file being read was not in the correct format. I'm mostly curious about the under-the-hood goings-on here. :) – JYelton Aug 24 '11 at 20:54
  • @spik : No, it reads a char and then returns a(n unsigned) char-value or -1 in an Int32 – H H Aug 24 '11 at 22:05

3 Answers3

4

Here BinaryReader.PeekChar

I read:

ArgumentException: The current character cannot be decoded into the internal character buffer by using the Encoding selected for the stream.

This makes clear that amount of memory read depends on Encoding applied to that stream.

EDIT

Actually definition according to MSDN is:

Returns the next available character and does not advance the byte or character position.*

Infact, it depends on encoding if this is a byte or more...

Hope this helps.

Tigran
  • 61,654
  • 8
  • 86
  • 123
  • It does appear that encoding dictates how many bits are read. I wasn't able to find information about how many bits UTF8 (the default) uses. – JYelton Aug 24 '11 at 21:02
  • 1
    @JYelton: UTF-8 has variable length characters (1 to 4 bytes). e.g. "e" takes one byte, but "é" takes two. – Thomas Levesque Aug 24 '11 at 21:03
  • @Thomas Does that mean it tries first to create a character using one byte, reading additional bytes as necessary (for a successful character)? – JYelton Aug 24 '11 at 21:04
  • @Thomas, Just confirmation of what you said.. not good english sorry, will edit it now. Actually I removed it :) – Tigran Aug 24 '11 at 21:09
1

Making your Read*() calls blindly and handling any exceptions that are thrown is the normal method. I don't believe that the stream position is moved if anything goes wrong.

Artfunkel
  • 1,832
  • 17
  • 23
0

The PeekChar() method of BinaryReader is very buggy. Even when trying to read a from a memory stream with UTF8 encoded data, PeekChar() throws an exception after reading a particular length of the stream. The BCL team has acknowledged the issue, but they have not committed to resolving the issue. Their only response is to avoid using PeekChar() if you can.

Pradeep Puranik
  • 384
  • 3
  • 10