0

Im trying to read a binary file where the data I'm interested in is separated across the file. Which read pattern is better? (suppose initial stream position is at byte 0)

  1. read(count=8192), seek(offset=20480, origin=Current), read(count=8192), seek(offset=12288, origin=Current)
  2. read(count=8192), seek(offset=28672, origin=Begin), read(count=8192), seek(offset=49152, origin=Begin)

Since .NET Streams enable me to choose the SeekOrigin, which seek pattern is better, the one starting from SeekOrigin.Begin, or the one that continues seeking from the SeekOrigin.Current position?

Does it matter? Can't the OS just do the calculation itself and decide for me?

Paul
  • 1,757
  • 2
  • 11
  • 21

1 Answers1

1

It doesn't matter. SeekOrigin.Current is just a convenience option that helps you avoid having to keep track of the absolute position yourself. Windows already does this internally so it has no trouble converting a Current offset to a Begin offset. Which is what it really needs. How you figured that the OS could seek to 20480 and then to 12288 automatically is unclear. It can't, Windows has no notion of a record size. A file is just a stream of bytes, there's no structure imposed on it.

The exact order in which you seek does matter. Your program runs fastest by visiting the file locations in order. Which is a side effect of how data is written to and then read from the disk platter, usually sequentially if the disk isn't heavily fragmented. Something the file system cache takes advantage of, it will pre-read data from the same disk track since it is very cheap to get and fairly likely to be used. By seeking in order, you'll maximize the odds that the data will be present in the cache. You'll just pay for the very fast memory-to-memory copy and won't have to wait for the disk.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536