6

The closest I've come to finally understanding the architecture of on disk btrees is this.

It's simple and very easy to read and understand. But I still feel confused. It doesn't seem like there is any in memory data structure at all. Am I missing something? What makes this a btree? Is it just the array of longs that "point" to the keys of their child nodes? Is that efficient? Is that just how most databases and filesystems are designed?

Are there ways of implementing on disk btrees (or other data structures) in memory? Where each nodes contains a file offset or something?

Toon Krijthe
  • 52,876
  • 38
  • 145
  • 202
alf
  • 681
  • 1
  • 8
  • 19
  • What in-memory data structures do you expect? A b-tree on disk is supposed to outlive the process that uses it.; An in-memory tree is just like the on-disk one (what major differences do you expect, except maybe different choices in the small). – usr Sep 27 '12 at 20:45
  • @usr I'm just confused as to how on disk data structures are implemented. How do nodes point to their child nodes? Do they hold references to their offsets? Their keys? Are single trees typically held in a single file? – alf Oct 01 '12 at 22:31

1 Answers1

2

Node pointers are typically stored on disk as addresses (for example using long integers).

In general an implementation choose to use either physical or logical addresses:

  • Physical addresses specify the actual offset (within a file or similar) where the node is stored.
  • In contrast, logical addresses require some kind of mechanism that resolve to a physical address each time a pointer is navigated/traversed.

Physical addressing is faster (because no resolve mechanism is needed). However, logical addressing could allow nodes to be reorganized without having to rewrite pointers. The ability to be able to re-organize nodes in this way can be used as the basis for implementing good clustering, space utilization and even low-level data distribution.

Some implementations use a combination of logical and physical addressing such that each address is composed of a logical address that refer (dynamically) to a segment (blob) and a physical address within that segment.

It is important to note that node addresses are disk based, therefore they cannot be directly translated to in-memory pointers.

In some cases it is beneficial to convert disk-based pointers to memory pointers when data is loaded into memory (and then convert back to disk-based pointers when writing).

This conversion is sometimes called pointer swizzling, and can be implemented in many ways. The fundamental idea is that the data addressed by swizzled in-memory pointers shall not be loaded into memory before the pointer is navigated/traversed.

The common approaches to this is to either use a logical in-memory addressing scheme or to use memory mapped files. Memory mapped files use virtual memory addressing in which memory pages are not loaded into memory before they are accessed. Virtual memory mapped files are provided by the OS. This approach is sometimes called page faulted addressing because accessing data on a memory page that is not mapped into memory yet will cause a page fault interrupt.

Mårten Wikström
  • 11,074
  • 5
  • 47
  • 87