0

I have tried all sorts of different combinations of flags such as FILE_FLAG_NO_BUFFERING and FILE_FLAG_OVERLAPPED, but fstream::write still beats the Windows API version.

Does std::fstream use internal buffering or other tricks or am I just messing up something?

#include <Windows.h>
#include <chrono>
#include <fstream>
#include <iostream>

std::string createTempFileName()
{
    char buf[800];
    tmpnam_s(buf, sizeof(buf));
    return buf;
}

using namespace std::chrono;

int main()
{
    std::uint64_t count = 1 << 23;

    std::cout << "test fstream\n";
    {
        auto start = steady_clock::now();
        auto path = createTempFileName();
        std::fstream fs(path, std::ios_base::out | std::ios_base::in | std::ios_base::trunc | std::ios_base::binary);
        for (std::uint64_t i = 0; i < count; i++)
            fs.write((char*)&i, sizeof(i));
        fs.close();
        DeleteFile(path.c_str());
        auto end = steady_clock::now();
        std::cout << "fstream: Elapsed time in milliseconds : " << duration_cast<milliseconds>(end - start).count() << " ms\n";
    }

    std::cout << "test WriteFile\n";
    {
        auto start = steady_clock::now();
        auto path = createTempFileName();
        HANDLE file = CreateFile(path.c_str(), GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_FLAG_NO_BUFFERING, NULL);
        for (std::uint64_t i = 0; i < count; i++)
            WriteFile(file, &i, sizeof(i), NULL, NULL);
        CloseHandle(file);
        DeleteFile(path.c_str());
        auto end = steady_clock::now();
        std::cout << "WriteFile: Elapsed time in milliseconds : " << duration_cast<milliseconds>(end - start).count() << " ms\n";
    }
}
jdt
  • 336
  • 2
  • 8
  • 1
    Having a quick look at the [source](https://github.com/microsoft/STL/blob/main/stl/inc/fstream) indicates that MSVC's `fstream` uses `fwrite()`. So perhaps the following Q/A could provide some insight: https://stackoverflow.com/questions/14290337/is-fwrite-faster-than-writefile-in-windows –  Nov 09 '21 at 18:04
  • @Frank, I saw that post before writing this question but it did not really help, thanks anyway =) – jdt Nov 09 '21 at 18:15
  • 1
    What are the results on your PC? What PC do you have? What disk do you have? Why should "NO_BUFFERING" be fast? And why did reading the linked article not help? – Thomas Weller Nov 09 '21 at 19:26
  • There's a meta discussion about [what makes a good performance question](https://meta.stackoverflow.com/questions/412875/what-makes-a-good-performance-question-on-so). This one is not a good performance question, IMHO. It's missing all the performance related information. – Thomas Weller Nov 09 '21 at 19:30
  • This looks like a variation of the _blame the tool_ kind of question. There aren’t many ways to write to files on Windows. `fwrite()` very likely uses `WriteFile()` underneath. The real question should be _what am I doing wrong?_ – Dúthomhas Nov 09 '21 at 20:20

1 Answers1

2

I suppose the obvious answer is to look at the source shipped with Visual Studio but since you chose not to do that, let us use our magic thinking hats instead.

There are not that many documented ways to write to a file on Windows and the C/C++ run-time will be using some Windows API to write. Ranked by popularity I would guess we are looking at WriteFile, memory mapped files and IStream. IStream is COM and too high level. Memory mapped files are annoying when you don't know the final size so WriteFile is the most likely candidate.

WriteFile does some minor work in user mode but at the end of the day it is going to end up with a context switch to kernel mode.

In your loop you are only writing 8 bytes every time and therefore a really large part of the CPU time is going to be spent switching in and out of kernel mode and whatever buffering the kernel (and storage hardware) is doing for this file handle does not prevent that.

You are passing the FILE_FLAG_NO_BUFFERING flag to CreateFile yet fail to do any of the required work to make these writes aligned and you fail to check the return value of WriteFile! It might be failing and this entire test could be invalid for all we know.

The C/C++ run-time is often willing to choose speed over size/memory usage and the magic word here is buffering. Even a tiny 16 byte buffer would probably almost double your speed in this specific instance. You can try to turn off this buffering with something like this:

std::fstream fs;
fs.rdbuf()->pubsetbuf(NULL, 0);
fs.open(...
Anders
  • 97,548
  • 12
  • 110
  • 164
  • I tried creating a buffer and writing it all at once (and only timing the calls to write output, not the loops, file creation, etc). `fstream` is consistently much faster, for some reason. The source code of `fstream` is quite mangled, but it appears to eventually call `fwrite`. I don't have the source code for `fwrite` right now. `fs.rdbuf()->pubsetbuf(NULL, 0);` didn't change the result. – Aykhan Hagverdili Nov 09 '21 at 19:51
  • Did you remove FILE_FLAG_NO_BUFFERING? Have you tried putting the WriteFile test before the fstream test? Have you added error checking? – Anders Nov 09 '21 at 20:01
  • In OP's code, if you remove that `FILE_FLAG_NO_BUFFERING`, the execution time for `WriteFile` skyrockets (slows down by 20x). But in my externally-buffered version with a single call to `WriteFile`, if I remove `FILE_FLAG_NO_BUFFERING`, execution speed becomes equivalent to `fstream`. Kind of strange. – Aykhan Hagverdili Nov 09 '21 at 20:15
  • I guess `fwrite` does smarter buffering. I am also guessing `WriteFile` could be tuned to outperform it with careful attention to the arguments. I don't have any evidence to back these guesses up though. – Aykhan Hagverdili Nov 09 '21 at 20:19
  • FILE_FLAG_NO_BUFFERING is for advanced usage and if the I/O is not sector aligned, who knows what happens, probably undefined behavior. – Anders Nov 09 '21 at 20:25