What is the most efficient way to populate a red black tree?

Question

Suppose I know everything about certain dataset and control order in which it comes in -- what is the most efficient way to organize it into a red black tree?

Or, in context of popular std::set/map implementations ("red black tree"-based) -- what is the most efficient way to populate my std::set with aforementioned dataset?

Before you answer, please consider this:

afaik, red black tree has cheap O(1) (correctly hinted) insert... unless tree depth breaches certain limit, in which case it will be rebalanced (with O(log N) cost) -- just like in case of std::vector::push_back() we end up with amortized constant complexity
e.g. if dataset is a list of values [0,999] there should be a sequence of hinted inserts that never trigger rebalancing (i.e. keeps each insert O(1)).

Very trivial example (need to figure out how to select these YYY/ZZZ values):

std::set<int> s;
std::vector< std::set<int>::iterator > helper(1000);

helper[0] = s.insert(0);
helper[1] = s.insert(helper[0], 1);
//...
helper[500] = s.insert(helper[YYY], 500);
//...
helper[999] = s.insert(helper[ZZZ], 999);

What I am looking for:

an algorithm that would allow me to populate ("red black tree"-based) std::set with (specifically) prepared (arbitrarily long) sequence where each insert is guaranteed O(1)
there should be a way to reduce additional memory requirements (i.e. size of helper) or ideally eliminate the need for it
an algorithm to populate the tree in worst possible case (to understand how incoming dataset should not look like) -- this is the case when we end up with maximum possible number of rebalance events
and bonus objective is to get answers for questions 1-3 for "AVL tree"-based std::set

Thank you

Can your maps and vectors change after you insert the initial data? — HolyBlackCat, May 20 '18 at 21:19
@HolyBlackCat I am looking to optimize initial population, all additional structures (e.g. `helper` vector) will be discarded after that. It is irrelevant what is going to happen to std::set/map after that, but you can assume whatever is the most convenient for you — C.M., May 20 '18 at 21:34
@alfC incoming data can be prepared/sorted in any way you want, you can assume it is entirely in memory (if it helps). I was thinking about smth similar to what you suggest, but it is too vague, require additional memory (there is no cheap `std::set::middle(it1, it2)`) and maybe there is better (and more formal) algorithm — C.M., May 20 '18 at 21:38
@C.M. It's revelant because if the containers don't change after you create them, then maps/sets shouldn't be used at all. They could be replaced with plain sorted arrays (searchable with binary search). — HolyBlackCat, May 20 '18 at 22:28
@C.M. if the original data is random access and ordered there is actually a cheap middle function, which is `begin + distance(begin, end)/2` — alfC, May 20 '18 at 22:52
Maybe try the ranged-constructors. They are linear if the input range is already sorted. — xiaofeng.li, May 21 '18 at 00:28
@HolyBlackCat I thought I made pretty clear that this question is about std::set/map. I am not looking for a workaround -- I am looking for an answer to specific problem. — C.M., May 21 '18 at 00:57
@alfC too bad `std::distance` and `operator+` on bidirectional iterators is O(N) :) — C.M., May 21 '18 at 00:57
See also: [efficiently insert *ordered values* into a `std::set`?](https://stackoverflow.com/a/18295366) — greybeard, May 21 '18 at 03:13
@C.M. You sure? [cppreference:set](http://en.cppreference.com/w/cpp/container/set/set) — xiaofeng.li, May 21 '18 at 03:36
@xiaofeng.li yes, I was wrong. Ranged ctor over sorted range (or simply correctly hinted sequence of insert() calls) have linear complexity use to the nature of red black tree -- high cost of occasional tree rebalancing will be amortized by cheap costs of most of inserts (just like in case of sequence of `std::vector::push_back()` calls) and overall complexity will be linear. But you typically try to improve this by using `std::vector::reserve()`. I am trying to figure out how to do the same for std::set -- how to populate it in such way that rebalancing doesn't happen at all. — C.M., May 22 '18 at 04:22

score 1 · Accepted Answer · answered Aug 25 '18 at 00:27

Found an algorithm that doesn't require additional memory and works for any binary search tree (red-black/AVL/etc):

incoming data array to represent a "flattened" binary tree (root at [0], root children at [1] and [2], left node children at [3] and [4], right node children at [5] and [6] and etc). Trick is to select roots of every subtree in such way that resulting binary tree has every lvl (but the last one) filled and on the last level all nodes form an "uninterrupted" line. Like this:
```
     N11
   /     \
 N21      N22
 / \      /
N31 N32 N33
```

See code below on how to transform sorted array into such sequence. I believe for any sequence there is only one possible way to arrange it in a binary search tree like that -- i.e. you end up with some sort of "stability" guarantee here (for given sequence length we know exactly where each element will end up in the tree).

then you perform one pass over your data and fill your tree level-by-level. At each level we know exactly how many elements to pull (2^(lvl-1)) before switching to next lvl (or running out of data). At the start of each iteration we reset our position to the leftmost element (std::set<T>::begin()) and after inserting left and right children we move to next leaf on current level (double ++it from result of last insert() call).

Notes:

with std::set<int> performance benefit (compared to hinted insert of sorted sequence is 5-10%)
unfortunately, MS red-black tree implementation ends up performing a lot of unnecessary work here -- checking neighboring elements (to ensure insert doesn't break binary tree invariant), repainting nodes (newly inserted node for some reason is always red) and probably smth else. Checking neighbors involves additional comparisons and memory accesses as well as traversing tree up multiple levels
benefit of this approach would be significantly higher if it was implemented internally (not using std::set public interface) as a function that expect data to be conforming requirements and declaring "undefined behavior" if it isn't...
... in this case even better algorithm would populate tree depth-first and would require input data to be rearranged differently ([N11, N21, N31, N32, N22, N33] in the example above). We will end up doing only one tree traversal too... Alas, can't implement this approach using std::set public interface, though -- it will enforce red-black tree invariant on every step of construction causing unnecessary rebalancing

Code: (MSVC 2015, pardon for potato quality -- it was written on the knee in like an hour)

#include <set>
#include <cassert>
#include <vector>
#include <utility>
#include <chrono>
#include <cstdio>


using namespace std;


unsigned hibit(size_t n)
{
    unsigned long l;
    auto r = _BitScanReverse(&l, n);
    assert(r);
    return l;
}


int const* pick_root(int const* begin, int const* end)
{
    assert(begin != end);
    size_t count = end - begin;

    unsigned tree_order = hibit(count);         // tree height minus 1
    size_t max_tail_sz = 1 << tree_order;       // max number of nodes on last tree lvl
    size_t filled_sz = max_tail_sz - 1;         // number of nodes on all but last tree lvl
    size_t tail_sz = count - filled_sz;         // number of nodes on last lvl

    return (tail_sz >= max_tail_sz/2) ?         // left half of tree will be completely filled?
        begin + max_tail_sz - 1                 // pick (2^tree_order)'s element from left
        :
        end - max_tail_sz/2;                    // pick (2^(tree_order - 1))'s element from right
}


vector<int> repack(vector<int> const& v)
{
    vector<int> r; r.reserve(v.size());
    if (!v.empty())
    {
        unsigned tree_order = hibit(v.size());  // tree height minus 1

        vector<pair<int const*, int const*>> ranges(1, make_pair(&v[0], &v[0] + v.size()));
        for(size_t i = 0; i <= tree_order; ++i)
        {
            vector<pair<int const*, int const*>> ranges2; ranges2.reserve(ranges.size()*2);

            for(auto const& range: ranges)
            {
                auto root = pick_root(range.first, range.second);
                r.push_back(*root);

                if (root != range.first)
                {
                    ranges2.push_back(make_pair(range.first, root));

                    if (root + 1 != range.second)
                        ranges2.push_back(make_pair(root + 1, range.second));
                }
            }

            ranges.swap(ranges2);
        }
        assert(ranges.empty());
    }
    return r;
}


set<int> populate_simple(std::vector<int> const& vec)
{
    set<int> r;
    for(auto v: vec) r.insert(v);
    return r;
}


set<int> populate_hinted(std::vector<int> const& vec)
{
    set<int> r;
    for(auto v: vec) r.insert(r.end(), v);
    return r;
}


set<int> populate_optimized(std::vector<int> const& vec)
{
    set<int> r;
    if (vec.empty()) return r;

    int const* p = &vec[0];
    int const* pend = &vec[0] + vec.size();

    r.insert(*p++);                   // take care of root
    if (p == pend) return r;

    for(size_t count = 1; ; count *= 2) // max number of pairs on each tree lvl
    {
        auto pos = r.begin();

        for(size_t i = 1; ; ++i)
        {
            r.insert(pos, *p++);
            if (p == pend) return r;

            //++pos;            // MS implementation supports insertion after hint

            pos = r.insert(pos, *p++);
            if (p == pend) return r;
                            // pos points to rightmost leaf of left subtree of "local" tree
            ++pos;          // pos points to root of "local" tree (or end())

            if (i == count) break;

            ++pos;      // pos points to leftmost leaf of right subtree of "local" tree
        }
    }
}


struct stopwatch
{
    chrono::high_resolution_clock::time_point start_;

    stopwatch() : start_(std::chrono::high_resolution_clock::now()) {}

    auto click()
    {
        auto finish = std::chrono::high_resolution_clock::now();
        auto mks = std::chrono::duration_cast<std::chrono::microseconds>(finish - start_);
        return mks.count();
    }
};


int main()
{
    size_t N = 100000;
    vector<int> v(N, 0);
    for(unsigned i = 0; i < N; ++i) v[i] = i;   // sorted array

    auto rv = repack(v);

    {
        stopwatch w;
        auto s = populate_simple(v);
        printf("simple   : %I64d mks\n", w.click());
    }

    {
        stopwatch w;
        auto s = populate_hinted(v);
        printf("hinted   : %I64d mks\n", w.click());
    }

    {
        stopwatch w;
        auto s = populate_optimized(rv);
        printf("optimized: %I64d mks\n", w.click());
    }

    return 0;
}

Typical results:

simple   : 14904 mks
hinted   : 7885 mks
optimized: 6809 mks

simple   : 15288 mks
hinted   : 7415 mks
optimized: 6947 mks

I am pretty sure measurements aren't entirely accurate, but relation always stands -- optimized version is always faster. Also, note that algorithm used to rearrange elements might probably be improved -- aim was to optimize tree population (not input data preparation).

rici · Answer 2 · 2018-05-21T04:20:58.857

0

First, sort the input.

The ideal would be to then put the sorted input into a balanced binary tree, but it's ok to just pretend it's in a tree; it just takes a bit more bookkeeping. It doesn't actually have to be a real tree data structure; you can use an array where the root is element 0 and the children of element i are at 2i+1 and 2i+2. In any case, the tree can be built recursively.

Once you have a balanced binary tree of your original data, you need to copy it into the set without incurring any rebalancing. To do that, do a breadth-first scan of your tree (if you use the array mentioned above, this is just a sequential scan of the array, which makes this step really easy). You could save the insert points for each level in the BFS in order to have the hints for the next level (so you would need to be able to hold iterators to about half the tree), but it would be easier and probably faster to just walk the set as you build it, starting at the beginning at the start of every level, and otherwise advancing by two elements after each insert.

None of this is faster than just building the set sequentially. But it's an answer to the question.

For the worst hinted-insert population, insert the elements in reverse sorted order, hinting each insert with the insertion point of the previous insertion point.

I think the same algorithms will work with AVL trees.

edited May 21 '18 at 04:20

answered May 21 '18 at 04:13

rici

234,347
28
237
341

I need to play with this algorithm a bit. But I bet it could be improved -- for example we probably can get away with holding less than half of of insert iterators by taking advantage of the fact that rebalancing kicks in at tree depth > 2*log N (or whatever that number is). I am thinking about tree of sorted arrays, where each array gets inserted into different part of tree without triggering rebalancing. Another note -- in worst case scenario assumption is that inserts are still correctly hinted... i.e. question is "how to maximize rebalancing costs with sequence of correctly hinted inserts" – C.M. May 22 '18 at 04:38
... and answer to that question is probably "sequence of `insert(myset.end(), n)` calls" -- because in this case we always insert into rightmost branch of binary tree mazimizing tree depth growth... exactly what I normally do when populating std::set with pre-sorted data. :-\ That is unless implementation specifically optimizes this situation (what @alfC probably referred to). – C.M. May 22 '18 at 04:47
@c.m.: if you're building a red-black tree and you gave access to internal structures, there is a much easier and more efficient algorithm, although it also starts by sorting the initial values. You can avoid the sort if you knoe (or discover) that the data is sorted. Once you have the elements in order: build a balanced tree from sorted list by recursively building a tree of the first half; then the second half; then join with the middle element (not part of either half) as root... – rici May 22 '18 at 04:58
Then color all the nodes black except the ones on the last complete row which have children. You can do all that in o(n) time and o(log n) space for the recursion stack, and with a bit of cleverness you can get rid of the stack, making it o(1) space. The only thing that costs o(n log n) is the initial sort, if it is required. – rici May 22 '18 at 05:02
Also: i wouldn't keep a vector of iterators fir the hints. As I said, you can find those by walking the tree being built, which is linear time (and quite fast, too). – rici May 22 '18 at 05:04
Finally found some time to read on red-black trees and think about this problem. Found a solution (see my answer)... as well as few problems in MS implementation that reduce it's efficacy :-/ – C.M. Aug 25 '18 at 00:29

What is the most efficient way to populate a red black tree?

2 Answers2