0

I have a 16GB file that I read sequentially from the HD in 4KB blocks and for which I want to calculate the reading time if I read it:

  • one block at a time,
  • one block every 2,
  • one block every 4,
  • ...
  • one block every 512

my code looks like:

...
constexpr size_t B = 1 << 12; //4KB block size
constexpr size_t j = 1; // 1, 2, 4, ..., 512

std::ifstream f(filename);
size_t M = N / (j * B); // # of actual blocks to read

auto t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < M; i++){
    f.seekg(i * j * B, f.beg);
    f.read(buff, B);
}
auto t2 = std::chrono::high_resolution_clock::now();
auto elaps = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();

f.close();
...

When I go to measure the times, however, I have this strange behavior:

j      M    % of f  time (ms)
1   4194304 100,0%  79815,3
2   2097152  50,0%  80141,9
4   1048576  25,0%  79963,0
8   524288   12,5%  79721,7
16  262144    6,3%  79974,9
32  131072    3,1%  80374,9
64  65536     1,6%  80708,3
128 32768     0,8%  80674,9
256 16384     0,4%  80423,3
512 8192      0,2%  17308,4

j is the 'jump' i read a block of 4KB every j blocks, M is the total number of blocks to read for complete the file of 16GB. The file is a binary file containing randomly generated bytes.

What's going on?

  1. Why is the total time constant even if the total bytes to be read decrease?
  2. Can the seek be ignored and the file read without skipping?
  3. What happens for j = 512?
00101010
  • 1
  • 1
  • 1
    Cause: OS cache predictive read-ahead and hard-disk predictive read-ahead. Reasoning: Typically files are read sequentially (unless you do OS specific things to request non-cached access). As re-positioning the heads is very expensive (if you still have spinning-rust) and issuing new data transfer requests is also expensive it makes sense for the OS and disk to co-operate in transferring more data then you request (into the cache) in a single read. – Richard Critten May 31 '20 at 15:34
  • 1
    In addition to the OS buffering and in-drive caching, `std::ifstream` has its own buffer, as does the underlying cstdio `FILE` object. Really it's buffers all the way down. – Miles Budnek May 31 '20 at 15:48
  • Yeah, It makes sense infact increasing `B` has improved the times. It's just strange that to read 0.4% of the data the OS chooses to read 100% of the file. – 00101010 May 31 '20 at 16:01

0 Answers0