Randomly read large file with mmap and huge pages

Question

I am using mmap to read a large database file (say, 100GB), for which indexes are kept in main memory (key-offset pairs).

Because of the default 4KB virtual memory page size, I assume that read calls on the file system will also use blocks of 4KB. However, that is quite inefficient for the access patterns of my application. Thus, I was investigating the possibility of using huge pages to transparently increase the size of the I/O units from 4KB to 2MB.

The typical use of huge pages seems to be to improve memory allocation and TLB utilization, but I can't find any information on how that relates to actual file I/O. With mmap, it seems like huge pages are only supported for private anonymous maps. Is that assumption correct? I also tried looking into libhugetlbfs, but couldn't find out how I can read an actual file with it.

So, is there a way to access a file transparently using mmap and use I/O units larger than 4KB?

You'll need a *lot* of RAM to do that. Every time you need to read data that isn't already in memory, if there aren't any 2 MB pages available the kernel will have to coalesce one. Too much of that (and it won't take much...) will cause disastrous performance problems and is likely to wake the OOM killer. And how much data will each random read access? If it's not in the MB range or more, if mapped pages need to be evicted to bring in the needed data, 2MB will need to be read even if you're reading just 1 byte. Benchmark this mmap solution and others, too, like using `pread()`. — Andrew Henle, Apr 30 '17 at 12:14
I appreciate your comment but it does not address the problem of how to actually read the file with huge pages. Before looking into the performance issues, I would like to know if it is possible at all. Thanks. — Caetano Sauer, Apr 30 '17 at 12:36

score 2 · Accepted Answer · answered Sep 15 '18 at 17:37

Linux does not support usage of huge pages with page cache (same as with other OSes).

The most important reason for that is that page cache is used (shared) by every process in the system and by the kernel itself.

Consider the following scenario: your process maps file using 2MB huge pages, but then another process maps it using regular 4KB pages. The only way to do this is to switch your processes to 4KB pages on the fly, hence it was pointless to start with 2MB pages in the first place.

What you actually need is to ask the kernel to start prefetching data using either fadvise with FADV_WILLNEED or madvise with MADV_WILLNEED. Doing a syscall is not "free", but if you know you are going to access 2MB region soon, they should be perfect.

For additional information you can read this to get more insight about what kernel developers think (thought) about huge pages.

Randomly read large file with mmap and huge pages

1 Answers1