18

I was reading about skip lists and MemSQL and was wondering why skip lists are not more widely used in databases? Are there any major disadvatages to using skiplists?

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
lucidxistence
  • 409
  • 5
  • 15

3 Answers3

29

Databases are typically so huge that they have to be stored in external memory, such as a giant disk drive. As a result, the bottleneck in most database applications is the number of times that we have to do a memory transfer from the disk drive into main memory.

B-trees and their variants are specifically designed to minimize the number of block reads and writes necessary to perform each of their operations. Mathematically, the number of memory transfers required for each B-tree operation is O(log n / log B), where B is the block size. Compare this to a skiplist, which requires O(log n) memory transfers on expectation. Since B is usually measured in megabytes, log B can be in the neighborhood of 15 - 25, so the B-tree can be significantly faster. Even when the database is in main memory, the effect of the memory hierarchy (L1 and L2 caches, etc.) can be so pronounced that B-tree variants are still faster in practice than many other data structures. This Google blog post gives some background on this.

Although each operation on a B-tree typically requires more CPU work than corresponding operations in other data structures, the fact that they require so few memory transfers tends to make them significantly faster in practice than other data structures. Therefore, it would not be advisable to use a skip list in a database.

There's another reason B-trees are nice: they're worst-case efficient. Although deterministic skip lists do exist, most skiplist implementations are randomized and give expected guarantees on their behavior. In a database, this might be unacceptable because many use cases on databases require worst-case efficient behavior.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • 1
    A well-written and insightful answer. Hit all the points I needed to know. Thank you! – TheAddonDepot Jan 13 '17 at 13:54
  • Can't skip lists capture the same advantages by packing B elements per node? – Joseph Garvin Jan 12 '22 at 06:00
  • That’s an interesting point and I honestly hadn’t thought of that! I suppose you could do that. Let me think about that… – templatetypedef Jan 12 '22 at 16:06
  • @JosephGarvin: A skip list that stores B elements per node would require O(log(n/B)) = O(log(n) - log(B)) lookups (branching factor is still 2). To match a B-tree, you'd have to also group the "skip" links within each layer into blocks, and at that point you've basically got a B-tree. – tom Jul 01 '22 at 03:37
  • @tom I was imagining say you have a block that stores 2^n elements contiguously. You would only keep only one set of skip links in that block (once you find the block you need you do regular binary search on the contiguous elements). You could keep them in the same allocation, adjacent to the elements. Is that what you're saying is basically a B-Tree? – Joseph Garvin Jul 01 '22 at 17:14
  • @JosephGarvin: I thought you meant something like this: https://i.stack.imgur.com/zO9Ji.png (B elements per green block; requires up to log(n/B) memory accesses to locate a green block). Here is what I referred to when I said "grouping the skip links into blocks" (basically a B-Tree): https://i.stack.imgur.com/6IEz0.png; is that what you originally meant? [This paper](https://arxiv.org/abs/1005.0662) describes the same idea (see Figure 2) and calls it a "B-Skip-List". – tom Jul 03 '22 at 15:36
  • @tom What I mean is more like the first picture, except both the skip list nodes and the data are collapsed into one alloc. So if you want a chunk with 32 elements and skip height 4, you allocate sizeof(pointer)*4+sizeof(element)*32 bytes and use the first bytes for the skip list and the rest for the elements. – Joseph Garvin Jul 04 '22 at 16:55
  • @JosephGarvin: Right, so my original comment stands – the number of green blocks is n/B, so the expected number of blue blocks that need to be accessed is log₂(n/B). Collapsing as you describe doesn't make much difference, it just saves one memory access at the very end. – tom Jul 05 '22 at 08:47
7

Though its late in the game but I felt the urge to reply as its top rated answer and perhaps doesn't convey complete message.

Skip lists differ from balanced tree data-structure as it allows combining several lists efficiently. In data-base terms it allows indexes based on skip-lists to be combined efficiently. A good example is Lucene which powers search engines like Solr/ElasticSeach. https://issues.apache.org/jira/browse/LUCENE-866.

B-Tree has problems in combining multiple indexes without indexing the overall combination a-priori which is not efficient as it requires re-indexing of historical records.

Hence whenever data-store has to support arbitrary queries on data skip lists are an ideal choice.

Rahul
  • 159
  • 3
  • 7
0

For on-disk DBMS caching in unit of page blocks, I agree with templatetypedef's answer and the comment below.

@JosephGarvin: A skip list that stores B elements per node would require O(log(n/B)) = O(log(n) - log(B)) lookups (branching factor is still 2). To match a B-tree, you'd have to also group the "skip" links within each layer into blocks, and at that point you've basically got a B-tree.

Skip lists also have a disadvantage from the perspective of handling concurrent access.

On "Implementation of on-disk concurrent skip list for an alternative of B-tree Index", describes it like below.

  • Efficient parallel access is harder to achieve than B-tree variants
    • On Skip List and B-tree variants, the threads start their search from the same starting point, and if one of the threads that goes along the same route acquires a W-lock of a node first, it will cause a contentions with the other threads and the throughput of parallel access decreases due to these
    • However, in the Skip List, if a thread acquires a W-lock of a node, other threads that want to pass through the locked node is blocked at the node on all levels up to the level of the node
      • Fundamental difference may be that branch and leaf (node) locks are not separated in Skip List unlike B-tree variants

In addition, the article describes another disadvantage.

  • Range scan (≒iterating by specifying a range) of entries can only be performed in the direction decided at the time of data structure design
    • Although it may be possible to do it in both directions, the complexity of the logic (especially for parallel access) may increase significantly if you want to achieve it without reducing the processing efficiency too much
bustab
  • 1
  • 1