3

I wrote this small program which reads every 5th character from Random.txt In random.txt I have one line of text: ABCDEFGHIJKLMNOPRST. I got the expected result:

  • Position of A is 0
  • Position of F is 5
  • Position of K is 10
  • Position of P is 15

Here is the code:

static void Main(string[] args)
{
    StreamReader fp;
    int n;
    fp = new StreamReader("d:\\RANDOM.txt");
    long previousBSposition = fp.BaseStream.Position;
    //In this point BaseStream.Position is 0, as expected
    n = 0;

    while (!fp.EndOfStream)
    {
        //After !fp.EndOfStream were executed, BaseStream.Position is changed to 19,
        //so I have to reset it to a previous position :S
        fp.BaseStream.Seek(previousBSposition, SeekOrigin.Begin);
        Console.WriteLine("Position of " + Convert.ToChar(fp.Read()) + " is " + fp.BaseStream.Position);
        n = n + 5;
        fp.DiscardBufferedData();
        fp.BaseStream.Seek(n, SeekOrigin.Begin);
        previousBSposition = fp.BaseStream.Position;
    }
}

My question is, why after line while (!fp.EndOfStream) BaseStream.Position is changed to 19, e.g. end of a BaseStream. I expected, obviously wrong, that BaseStream.Position will stay the same when I call EndOfStream check?

Thanks.

svick
  • 236,525
  • 50
  • 385
  • 514
vldmrrdjcc
  • 2,082
  • 5
  • 22
  • 41
  • 3
    StreamReader has an internal buffer to allow decoding bytes to text. Using any of its methods is going to cause it to slurp bytes from the file stream. Its Position value is going to be unpredictable. – Hans Passant Sep 29 '11 at 12:37
  • @HansPassant, I think that is the reason for the call to `DiscardBufferedData()` in the posted code. – svick Sep 29 '11 at 13:51
  • @HansPassant, yes, I played with my code above, and I noticed that Read() method of StreamReader also causes change of the BaseStream.Position in some situations, so it is unpredictable. – vldmrrdjcc Sep 29 '11 at 17:25

3 Answers3

4

Thre only certain way to find out whether a Stream is at its end is to actually read something from it and check whether the return value is 0. (StreamReader has another way – checking its internal buffer, but you correctly don't let it do that by calling DiscardBufferedData.)

So, EndOfStream has to read at least one byte from the base stream. And since reading byte by byte is inefficient, it reads more. That's the reason why the call to EndOfStream changes the position to the end (it woulnd't be the end of file for bigger files).

It seems you don't actually need to use StreamReader, so you should use Stream (or specifically FileStream) directly:

using (Stream fp = new FileStream(@"d:\RANDOM.txt", FileMode.Open))
{
    int n = 0;

    while (true)
    {
        int read = fp.ReadByte();
        if (read == -1)
            break;

        char c = (char)read;
        Console.WriteLine("Position of {0}  is {1}.", c, fp.Position);
        n += 5;
        fp.Position = n;
    }
}

(I'm not sure what does setting the position beyond the end of file do in this situation, you may need to add a check for that.)

svick
  • 236,525
  • 50
  • 385
  • 514
  • I think I understand. Just one more question: StreamReader don't buffer entire stream, but some part of it, so in some unpredictable moment, when he needs more data, it will read from the BaseStream? Did I get it? – vldmrrdjcc Sep 29 '11 at 18:05
  • Pretty much, yeah. Except it's not that unpredictable. The base stream is read anytime when the buffer is empty and you need to read some bytes. Especially in the code you posted, since you never read much from the buffer before discarding it. – svick Sep 29 '11 at 19:09
  • It seems that EndOfStream reads exactly the 1024 next bytes. If you DiscardBufferedData after an EndOfStream call and then ReadLine you will get the part of the line that comes after 1024 characters. – Stefanos Kargas Jan 18 '13 at 16:38
2

The base stream's Position property refers to the position of the last read byte in the buffer, not the actual position of the StreamReader's cursor.

Saeb Amini
  • 23,054
  • 9
  • 78
  • 76
  • 1
    While the documentation doesn't say this clearly, `Position` really does refer to bytes, not buffers. I'm not sure what would that even mean. – svick Sep 29 '11 at 13:41
1

You are right and I could reproduce your issue as well, anyway according to (MSDN: Read Text from a File) the proper way to read a text file with a StreamReader is the following, not yours (this also always closes and disposes the stream by using a using block):

try
{
    // Create an instance of StreamReader to read from a file.
    // The using statement also closes the StreamReader.
    using (StreamReader sr = new StreamReader("TestFile.txt"))
    {
        String line;
        // Read and display lines from the file until the end of
        // the file is reached.
        while ((line = sr.ReadLine()) != null)
        {
            Console.WriteLine(line);
        }
    }
}
catch (Exception e)
{
    // Let the user know what went wrong.
    Console.WriteLine("The file could not be read:");
    Console.WriteLine(e.Message);
}
Davide Piras
  • 43,984
  • 10
  • 98
  • 147
  • If you put a break point to the line while (!fp.EndOfStream) you will see that BaseStream.Position is changed right after that line, and before Console.WriteLine(... – vldmrrdjcc Sep 29 '11 at 12:18
  • why don't you simply read Position instead of BaseStream.Position? – Davide Piras Sep 29 '11 at 12:21
  • StreamReader don't have its own Position property :S – vldmrrdjcc Sep 29 '11 at 12:23
  • you are right and actually I could reproduce your issue in a snippet. Is there any reason why you can't read the stream like in my snippet above? – Davide Piras Sep 29 '11 at 12:27
  • 1
    @DavidePiras, maybe the file structure isn't line-based. And maybe it's too big to read into memory all at once. – svick Sep 29 '11 at 13:52