3

I searched a bit but haven't found the answer to this question..

I built a non-binary tree, so each node can have any number of children (called n-ary tree i think)

To help with searching, I gave every node a number when i built the tree, so that every node's child nodes will be bigger, and all the node to the right of the it will be bigger as well.

something like this:

tree stucture

This way I get logn time for search

The problem comes when I want to insert nodes. This model would not work if I want to insert nodes anywhere but the end.

I thought of a few ways to do it,

  1. insert the new nodes at the desired location, then update the number of all the nodes "behind" it.

  2. instead of using a single number to represent each node, use an array of numbers. The numbers in the array will represent its location on that specific level. For example, node 1 will be {0}. Node 9 will be {0,2}. Node 7 will be {0, 0, 1, 2}. Now when inserting, I just need to change the numbers on that level.

  3. forget all the numbering and just compare every node until the correct one is found. Insertion don't need to care about numbers either.

My question is, which way would be better? I'm not sure if using an array of integers to represent each node is very fast.. maybe it is still faster than the first way? Are there other ways of going about this?

Thank you in advance.

user1610613
  • 517
  • 8
  • 13
  • What are you searching for -- the numbers that you show or unspecified data which is stored in the nodes? If the latter, I don't see how having the nodes numbered like that gives you `log(n)` search time for the data. What is the link between the data and the numbers? – John Coleman Jan 02 '17 at 03:34
  • Why do you need variable number of children again? – Adrian Colomitchi Jan 02 '17 at 04:06
  • I am searching for the number. They are only created to help me search. There is no other link. – user1610613 Jan 02 '17 at 06:13
  • How do you get log(n) search time from this structure? If I'm searching for node 13, what is the sequence of nodes that are searched, and why? – Jim Mischel Jan 03 '17 at 14:59
  • it will be 1, 2, 8, 9, 10, 11, 12, 13, basically skipping everything under node 2. The way it does this is by comparing the target node number to the searched node number and the next node as well to determine if it should enter the current node. For example, the program looks at 2, then looks at the next node 8, and find that 13 is bigger than 8. So it moves on to 8, and looks at next node 9. 13 is also bigger than 9, so it moves on to 9, and there is no next node, so it enters the sub-nodes of 9. Also I went with option 2 in the end. – user1610613 Jan 12 '17 at 03:27

1 Answers1

2

I gather that the problem you have is to assign a unique identifier to each node in such a way that you can find the node given its unique id in sublinear time.

This is not usually a problem for transient (in-memory) data structures, because typical tree implementations never move a node in memory (until it is deleted). That lets you use the address of the node as a unique identifier, which provides O(1) access. Languages other than C dress this up in an object like a tree iterator or node reference, but under the hood the principle is the same.

However, there are times when you really need to be able to attach a fixed-for-all-time identifier to a tree node, in a way which will be resilient against, for example, persisting the tree to permanent storage and then reserializing it in a different executable image.

One well-known hack is to use floating-point ids. When a new node is inserted, its id is assigned to be the average of its immediate neighbours. For the purpose of this computation, we pretend that there is a node on the left of the tree with id 0.0 and a node to the right with id 1.0, so every node has two neighbours, even if it is the new left- or right-most node. In particular, the root node is given the id 0.5, which is the average of the 0.0 and 1.0 imaginary boundary nodes.

Unfortunately, floating point precision is not infinite, and this hack works best if insertions are always at random places in the tree. If you insert all the nodes at the end, you will rapidly exhaust the floating point precision. You could periodically renumber all the nodes, but that defeats the purpose of having a permanent unchanging unique node id. (For some problem domains, it's acceptable, though.)

Of course, you don't really have to use floating point. A double on standard architecture has 53 bits of precision, which is plenty if your insertions are stochastic and very little if you always insert at the same place; you can use all 64 bits of an unsigned 64-bit integer by (conceptually) locating a fixed binary point prior to the high-order bit. The average computation works the same, except that you need to special case the computation with the 1.0 value.

This is essentially the same as your idea of labelling the nodes with a vector of indices. That scheme has the advantage of never running out of precision, and the disadvantage that the vectors can get quite long. You could also use a hybrid solution where you start a new level only when you run out of precision with the current level.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Thank you for the answer. So if I understand correctly, say I have nodes 1, 2, 3, 4, and I were to add a new node at a position after 2, it would check the next node, which is 3, then added the new node as 2.5. And another one added after 2.5 would be 2.75. This would require minimal change on the structure, but finding the next node will take a little extra work. Thank you again. – user1610613 Jan 02 '17 at 06:20