1

I've been testing out performance with reading files in C++ using Visual Studio, and I've got some results that I really don't understand.

My code is as follows:

std::vector<unsigned char> Method1(std::string filePath)
{
    std::basic_ifstream<unsigned char> file(filePath, std::ios::binary);

    file.seekg(0, std::ios::end);
    std::streamsize size = file.tellg();
    file.seekg(0, std::ios::beg);

    std::vector<unsigned char> buffer(size);
    file.read(buffer.data(), size);

    return buffer;
}

(I actually used uint8_t instead of unsigned char, but since it's just a typedef I've used the latter here to better demonstrate the problem)

I gave it a 3mb file to read and used the std::chrono functions to time it and these were my results:
Debug Configuration - 10.2 seconds
Release Configuration - 98 ms

The big difference between debug and release was already cause for concern.

So I tried replacing all references to "unsigned char" with "char" (std::basic_ifstream<char> instead of std::basic_ifstream<unsigned char> etc.) and re-ran the program.
I found that it ran in under 3ms in both debug and release.

After a bit more fiddling, I realised that the basic_ifstream<unsigned char> was the problem. If I left everything else as is and changed the file to basic_ifstream<char> (with a reinterpret_cast<char *>(buffer.data()) as well), then it also worked fine.

I've verified this on two separate machines, one running the newest release version of Visual Studio 2015 and one running Preview 3 of Visual Studio 15.

Is there some sort of subtlety that makes the basic_ifstream perform so poorly when it's using unsigned chars?

user1000039
  • 785
  • 1
  • 7
  • 19
  • I know MSVS had a lot of performance problems with streams and they did fix those in the newest 2015 update. Maybe they did not get those changes applied to the `unsigned char` specialization. Also FWIW: http://stackoverflow.com/questions/604431/c-reading-unsigned-char-from-file-stream – NathanOliver Aug 15 '16 at 16:36
  • The `char` specialization of these template classes are pre-compiled and stored in the msvcp140d.dll runtime file. When you use `unsigned char` then you get the completely unoptimized version of it with all the iterator debugging love included. Which you can disable but that would be a mistake. The Debug configuration is meant to help you debug your code, not to run it fast. Test your code with small datasets, making them twice as big does not help you find twice as many bugs. – Hans Passant Aug 15 '16 at 17:08

2 Answers2

5

Some of what you're seeing is almost certainly due to caching.

Reading a 3 mb file in 3 ms means you read the file at 1000 megabytes per second, or around 8 gigabits per second (ignoring framing and such, for the moment).

The theoretical maximum speed of a SATA 3 connection is 6 gigabits per second, but there's also an 8b/10b encoding, so the maximum visible speed you can hope for is 4.8 gigabits per second--just over half the speed you got.

Real drives are even more limited than that. Typical spinning hard drives are limited to around 130 megabytes per second. Fast SSDs can increase that to around 500 MB/s or so. So, unless you have a RAID configured with at least 16 fast SSDs available, you didn't read a 3 mb file from disk in only 3 ms (and RAID controllers add some overhead of their own, so to get that kind of speed dependably, you might need more like 18 or 20 fast SSDs).

The next bottleneck you'd run into would be the PCIe bus. PCIe 3.0 has a theoretical maximum bandwidth of about 985 MB/s per lane, so your RAID controller would need to use around 9 lanes to deliver data that fast--and if it goes beyond 8, the next reasonable step would (probably) be 16. Most RAID controllers only use 4 lanes, and a fair number of them only use PCIe gen 2 as well. That leaves them a long ways short of the bandwidth you're talking about (around a quarter the bandwidth). The fastest RAID controllers I know of (eight thousand series Adaptec) all use 8 lane PCIe boards. I don't know of any 16-lane PCIe RAID controllers.

The obvious alternative is that you read a fair amount of that data from main memory instead. Most operating systems cache data in main memory, so if you read the same data twice in (relatively) quick succession, the second time you'll be reading from memory instead of transferring data from the disk.

In other words, the first time you read the data, you get the time to read it from the actual storage device. Subsequent read attempts (that happen fairly soon after the first) can retrieve the data from the cache in main memory, typically giving much faster access (which is exactly why they do caching in the first place).

Right now it's almost impossible to guess how much of the difference you saw can really be attributed to the difference in your code, and how much to caching--but given the bandwidth you got, there's no real room for question that a good part of what you're seeing must be due to caching, since it's next to impossible that you actually read data from a storage device that fast.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
1

The difference in time is because during debug Visual Studio runs the code unoptimized. When compiled and run for release, it does low level tricks to optimize the code. See this link for more details, as well as this website.

Community
  • 1
  • 1
NonCreature0714
  • 5,744
  • 10
  • 30
  • 52