0

The InnoDB uses buffer bool of configurable size to store last recently used pages (b+tree blocks).

Why not mmap the entire file instead? Yes, this does not work for changed pages, because you want to store them in double write buffer before writing back to destination place. But mmap lets kernel manage the LRU for pages and avoids userspace copying. Also inkernel-copy code does not use vector instructions (to avoid storing their registers in the process context).

But when page is not changed, why not use mmap to read pages and let kernel manage caching them in filesystem ram cache? So you need "custom" userspace cache for changed pages only.

LMDB author mentioned that he chosen the mmap approach to avoid data copying from filysystem cache to userspace and to avoid LRU reinvention.

What critical disadvantages of mmap i missing that lead to buffer pool approach?

pavelkolodin
  • 2,859
  • 3
  • 31
  • 74

1 Answers1

1

Disadvantages of MMAP:

  • Not all operating systems support it (ahem Windows)

  • Coarse locking. It's difficult to allow many clients to make concurrent access to the file.

  • Relying on the OS to buffer I/O writes leads to increased risk of data loss if the RDBMS engine crashes. Need to use a journaling filesystem, which may not be supported on all operating systems.

  • Can only map a file size up to the size of the virtual memory address space, so on 32-bit OS, the database files are limited to 4GB (per comment from Roger Lipscombe above).

Early versions of MongoDB tried to use MMAP in the primary storage engine (the only storage engine in the earliest MongoDB). Since then, they have introduced other storage engines, notably WiredTiger. This has greater support for tuning, better performance on multicore systems, support for encryption and compression, multi-document transactions, and so on.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828