5

B trees are said to be particularly useful in case of huge amount of data that cannot fit in main memory.

My question is then how do we decide order of B tree or how many keys to store in a node ? Or how many children a node should have ?

I came across that everywhere people are using 4/5 keys per node. How does it solve the huge data and disk read problem ?

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Andy897
  • 6,915
  • 11
  • 51
  • 86
  • As you said, B Trees are useful when a lot of disk reads are required. In that case data is read in blocks. So the order of the tree is determined by the block size, key field size and pointer size. – Abhishek Bansal Feb 23 '15 at 16:16

1 Answers1

9

Typically, you'd choose the order so that the resulting node is as large as possible while still fitting into the block device page size. If you're trying to build a B-tree for an on-disk database, you'd probably pick the order such that each node fits into a single disk page, thereby minimizing the number of disk reads and writes necessary to perform each operation. If you wanted to build an in-memory B-tree, you'd likely pick either the L2 or L3 cache line sizes as your target and try to fit as many keys as possible into a node without exceeding that size. In either case, you'd have to look up the specs to determine what size to use.

Of course, you could always just experiment and try to determine this empirically as well. :-)

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • Thanks a lot for a great answer. But what is the point of building an in-memory B-tree ? I was under the impression that they are better options over other self balancing trees (AVL, Red Black etc.) only in the case when we have huge data that can not fit in-memory. – Andy897 Feb 23 '15 at 16:48
  • And when all the examples are putting 4/5 integers in a node, is it just for example purpose and does not make any practical sense ? – Andy897 Feb 23 '15 at 16:49
  • B trees were originally designed for databases, but due to memory caches they're sometimes used in main memory these days. As for 4 or 5, my guess is that they're just examples. From experience teaching data structures, it's really hard to fit larger B-trees onto a slide deck. – templatetypedef Feb 23 '15 at 19:19
  • Thanks a lot again. Can I please request your kind attention to another silly query here - http://stackoverflow.com/questions/28680730/keeping-avl-tree-balanced-without-rotations – Andy897 Feb 23 '15 at 20:09