0

I'm trying to build a search tree on top of my results. Some kind of k-ary tree with n leafs at the end. I'm looking for a C++ solution and experimenting with std::vector but can't get it done as I need memory consistency. It could be done by nested vectors but I can't do that.

Let me explain the details by example:

A unsorted result could be

Result R = { 4, 7, 8, 3, 1, 9, 0, 2, 2, 9, 6 }

On top of that I need a tree with nodes wich in my specific problem are centroids. But to keep it simple I will use artificial values here.

I define the search tree dimensions as

Height H = 2
Branch B = 3

The tree at first

4 7 8 3 1 9 0 2 2 9 6

Second step

layer_0          1.6    5      8.2
                   |    |        |
         +-+-+-+-+-+  +-+  +-+-+-+
         | | | | | |  | |  | | | |
layer_1  3 1 0 2 2 3  4 6  7 8 9 9

Last step

layer_0                 1.6          5           8.2
                          |          |             |
                  +---+---+    +-+---+    +---+----+
layer_1         0.8 1.6 2.4  4.2 5 5.8  6.4 8.2  8.4
                  |       |    |     |    |   |    |
                +-+   +-+-+    |     |    |   |  +-+
layer_2         1 0   2 2 3    4     6    7   8  9 9

This last tree is not a k-ary tree as the end-leafs sizes are 0 <= size <= |R|.

At this moment I'm experimenting with two vectors.

std::vector<size_t> layer_2;

std::vector<float> leafs;

std::size_t width, height;

With help of width and height it would be possible to navigate through leafs. But I'm questioning myself how to elegantly connect leafs and layer_2?

How would a good solution look like?

user1587451
  • 978
  • 3
  • 15
  • 30
  • _"... as I need memory consistency."_ What does this mean? That you need **contiguous** memory? Something else? And why are your trees full of values that don't appear in your original set? And why are your leaves not sorted? – Useless Mar 14 '17 at 15:43
  • Yes, contiguous memory is the correct way. Sorry for my english. The leaves are sorted. The tree uses mean values for the upper leaves. The leaves at the end (4, 7, 8, 3, ...) are grouped into groups with its values most near to the parent. – user1587451 Mar 14 '17 at 15:48
  • OK, so you have fixed-depth 3-ary tree, and a separate 1-level partial 3-ary tree ... this seems unnecessarily complex. Ah, I see, you need the subtree centroid values for some reason. What are you actually trying to achieve? – Useless Mar 14 '17 at 15:55
  • It's not a fixed depth. Just in this example. The depth varies between 2 and 4 (8<=k<=12 and 2<=h<=4). I wanna build a tree with centroids as leaf nodes except at the end. There, I wanna store my already obtained values. A vocabulary tree. – user1587451 Mar 14 '17 at 16:28

1 Answers1

0

Note: This solution is continuous in the sense that it uses a contiguous data structure (vector or array) instead of a Node pointer style tree, but has the potential to have unused space within the data structure depending on the application.

A case where this approach wastes a lot of space: The max amount of branches per node is large but most nodes actually have far fewer children. This will have no effect on how long it takes to find the leafs though. In fact it is a trade off to make that bit quite fast.

consider a 3 branch tree with 4 levels in continuous memory: R,a,b,c,aa,ab,ac,ba,bb,bc,ca,cb,cc,aaa,aab,aac,aba,abb,abc,baa.... where the index of a node's children ranges from (parent_index*3)+1 to (parent_index*3)+3

The important caveat I alluded to is that every node must always have it's three child spaces in the vector, array, whatever. If a node has say only 2 children, just fill that extra space with a null_child value to hold the space. (this is where the wasted space comes from)

The advantage is that now, finding all of the leaves is easy.

first_leaf_index = 0
for(i=0;i<(4-1);i++)//in this example 4 is the depth
    first_leaf_index += 3^(i) //three is max branches per node

At that point just iterate to the end of the data structure

Cameron White
  • 245
  • 1
  • 10