I use a self-balancing binary search tree (currently it is an AVL tree but can be substituted with another one). I noticed that there are distinct periods when only certain operation is performed: large deletion or insertion batches are executed rarely while most of the time it is immutable search tree. Will there be any perfomance gain if i postpone rebalancing to the end of the batch?
-
what is the size of a tree, size of a deletion batch and are keys in batch sequential or random? – AdamSkywalker Oct 27 '17 at 09:20
-
For batch insertions, insert all the new items into an empty AVL tree, and then merge the two trees. See http://www.geeksforgeeks.org/merge-two-balanced-binary-search-trees/. – Jim Mischel Oct 27 '17 at 15:50
-
2Can you give us a sense of, relatively speaking, how many insertions you’re doing, how many deletions you’re doing, and how big the trees are before this? That might inform the answer here. – templatetypedef Oct 27 '17 at 16:17
2 Answers
For batch insertions, insert the new items into a separate AVL tree, and then merge the two trees as described in http://www.geeksforgeeks.org/merge-two-balanced-binary-search-trees/. Merging the trees is an O(m+n) operation, where m and n are the sizes of the two trees. This approach should be much faster when the number of new items is small compared to the number of items in the existing tree.
For batch deletions, mark the items in the AVL tree as deleted. Then, do an inorder traversal of the tree, building a new AVL tree from the non-deleted nodes. See http://www.geeksforgeeks.org/sorted-array-to-balanced-bst/ for an example. Building a new tree from a sorted list of nodes is an O(n) operation.

- 131,090
- 20
- 188
- 351
The asymptotic complexity of say AVL Tree is O(log(N)). But the real computations is obviously decided at runtime.
Delaying rebalancing can make it look fast, but there are risks involved. The avg time to insert is O(logN) only when the tree is balanced. If the batch inserts made the tree highly skewed, we could be getting O(N) insertions time.
Anyway the insert would take O(logN), so rebalancing wouldn't have any much impact as it will also be O(logN).
To get the actual benefits, you need consider batch size of insertion, and search during the insertion/deletion and value of keys which are getting inserted.

- 2,959
- 8
- 30
- 60