Why is reading from a memory mapped file so fast?

Question

I don't have much experience with memory mapped i/o, but after using them for the first time I'm stunned at how fast they are. In my performance tests, I'm seeing that reading from memory mapped files is 30X faster than reading through regular c++ stdio.

My test data is a 3GB binary file, it contains 20 large double precision floating point arrays. The way my test program is structured, I call an external module's read method, which uses memory mapped i/o behind the scenes. Every time I call the read method, this external module returns a pointer and a size of the data that the pointer points to. Upon returning from this method, I call memcpy to copy the contents of the returned buffer into another array. Since I'm doing a memcpy to copy data from the memory mapped file, I expected the memory mapped reads to be not considerably faster than normal stdio, but I'm astonished that it's 30X faster.

PS: I use a Windows machine. I benchmarked my i/o speeds and my machine's max disk transfer rate is around 90 MiB/s

You may find the answers [here](http://stackoverflow.com/questions/192527/what-are-the-advantages-of-memory-mapped-files) — Steve Lorimer, Oct 19 '14 at 22:44
@SteveLorimer: I did read that page prior to posting. From what I gather from that thread, if the data is not in memory already, then the OS has to fetch the data from the disk. What I'm seeing in my test is there is no disk i/o that would correspond to a 3GB data transfer, I only see a transfer amounting to 2630 bytes. However, when I examine contents of memcpy-ed array, they match the expected data byte to byte. — DigitalEye, Oct 19 '14 at 22:59
Standard benchmark hazard. [Look here](http://superuser.com/questions/417057/is-there-a-way-to-reset-windows-file-cache). — Hans Passant, Oct 19 '14 at 23:26
@HansPassant: Thank you for the link. After emptying the Windows file cache between my sample runs, I see that my super-fast memory mapped i/o is 30X slower than what it was before! — DigitalEye, Oct 20 '14 at 05:55
@DigitalEye in that case, do you still think below answer is the right explanation? — Baiyan Huang, Jun 07 '20 at 12:51
@BaiyanHuang: It's been a while, but when I reread the thread, I seem to have observed that all performance gains by memory mapped I/O had been wiped out after dumping file cache. In other words, I seem to have made the case that memory-mapped I/O was not any faster than stdio! I would need to revisit those numbers to be sure there were no errors. But, memory-mapped I/O does exist for performance reasons and the accepted answer describes how it achieves those gains. A new benchmark is in order, I suppose. — DigitalEye, Jun 12 '20 at 22:17

score 30 · Accepted Answer · answered Oct 19 '14 at 23:04

The OS kernel routines for IO, like read or write calls, are still just functions. Those functions are written to copy data to/from userspace buffer to a kernel space structure, and then to a device. When you consider that there is a user buffer, a IO library buffer (stdio buf for example), a kernel buffer, then a file, the data may potentially go through 3 copies to get between your program and the disk. The IO routines also have to be robust, and lastly, the sys calls themselves impose a latency (trapping to kernel, context switch, waking process up again).

When you memory map a file, you are skipping right through much of that, eliminating buffer copies. By effectively treating the file like a big virtual array, you enable random access without going through the syscall overhead, so you lower the latency per IO, and if the original code is inefficient (many small random IO calls) then the overhead is reduced even more drastically.

The abstraction of a virtual memory, multiprocessing OS has a price, and this is it.

You can, however, improve IO in some cases by disabling buffering in cases when you know it will hurt performance, such as large contiguous writes, but beyond that, you really cant improve on the performance of memory mapped IO without eliminating the OS altogether.

So would it be fair to say that in my case, the data went directly from the disk to my array i.e. one copy as opposed to three that you are describing: disk to kernel buffer, kernel buffer to i/o buffer, and finally i/o buffer to my program's memory? — DigitalEye, Oct 19 '14 at 23:16
Yes. As well, if the kernel maps your file to a set of pages, and the pages aren't in existence (not yet resident), the kernel will page fault and read those pages directly. — codenheim, Oct 19 '14 at 23:38

Why is reading from a memory mapped file so fast?

1 Answers1