-2

I went through below article and trying to understand different data structures for data persistence . In the article it is written that sequential operations are good for B-tree but not random operations .

Article link

enter image description here

Will you please put some light on this with some example . Thanks in advance .

  • 2
    It's answered in the very next paragraph. "So random operations make B-trees problematic, performance-wise, due to hardware limitations—random "modify" operations cause multiple disk IOs." – Raymond Chen Jun 23 '17 at 05:04
  • Yes , random operations makes multiple seek of disk it might be at leaf the required key is present and in worst case "log n to the base k" page level seeks are possible . Same thing might happen if I wanted to insert a new key , then as per the B tree insertion its a bottom up approach and the position of key might be at root so again i have to traverse through whole pages leading to "log n to the base k" seeks . So my point is how hardware limitation is making random modification costly . – Priyaranjan Swain Jun 23 '17 at 05:14
  • 2
    You just answered your own.question. – Raymond Chen Jun 23 '17 at 15:03
  • 1
    [Please don't use images, use text.](https://meta.stackoverflow.com/q/285551/3404097) Please edit clarifications into your question. (Please try to be clearer still than your comment about exactly what your "point" is.) – philipxy Jun 24 '17 at 14:17

2 Answers2

0

Sequential keys (or monotonically increasing functions) generally don't cause problems for B-trees

That should be, sequences of accesses whose key values are consecutive (or monotonic) generally don't cause problems for B-trees.

The reason is as follows: During sequential accesses the key values tend to stay in the same B-tree node, vs random accesses likely constatly changing B-tree nodes; a B-tree node represents a secondary storage page; so the former means changing secondary storage pages less freqently, vs the latter more frequently; so the former is generally faster, vs the latter generally slower.

philipxy
  • 14,867
  • 6
  • 39
  • 83
0

The reason why random access operations are worse as compared to sequential operations is hidden in the structure of the B-Trees. A B-Tree node typically contains a set of records. And each B-tree node represents an actual disk block of your non-volatile storage. So basically, if you're trying to read random records, it is highly likely they're stored in different nodes (i.e., different blocks), and you'll have to make as many disk IOs. On the other hand, in the case of sequential access, the records tend to be present in the same blocks, so you'll have a lesser number of blocks to make IO to.

On a side note, it turns out that when we talk about pure B-Trees, they store even sequential records in a rather random manner and hence pose performance issues. This problem is solved by B+ Trees, which tend to store sequential records sequentially in the leaf nodes, which are connected with each other like a linked-list.