Number of Nodes in a Balanced Tree

Question

So I came up with an interesting problem and was seeing if there was an efficient way to solve with. So basically there is a balanced binary tree in which id numbers are kept (it is not a bst so there is no formal arrangement). You have a limited amount of queries to find out how many nodes there are. It is guaranteed that for every node E that the left subtree will have as many or one more node than the right subtree at that node E. What is the optimal way to ask the program to find out how many nodes there are? For example given a tree like this:

          1
      4      2
  3

The program will give the following output:

 Query: 1
 Response: 4 2
 Query: 4
 Response 3
 Query: 3
 Response: 0 0 
 Query: 2
 Response: 0 0
 Answer: 4

The practical approach would (of course) be to count the nodes as they are inserted into the tree, but I don't think that's the answer you're looking for. — Wintermute, Apr 06 '15 at 00:17
The tree has already been built. I am just querying to find the number of nodes. — user3188300, Apr 06 '15 at 00:19
As I understand the question, it's about whether there's a more efficient way than to visit all nodes, knowing that the tree is balanced. — Wintermute, Apr 06 '15 at 00:24
For example, I should have to only have to use maximum 70,000 queries for a tree that has 10^100 nodes. — user3188300, Apr 06 '15 at 00:26
I feel compelled to point out that a tree with 10^100 nodes would require a 333-bit address space to store, assuming a single byte per node. — Wintermute, Apr 06 '15 at 00:28
To get the rough idea the answer is between (2 to the power of the depth of the tree - 1) + 1 and 2 to the power of the depth of the tree . — Joshua Byer, Apr 06 '15 at 00:32
Yeah I got to this point but I could not figure out how to get the exact number of nodes with such little amount of queries. — user3188300, Apr 06 '15 at 00:34
I don't think there is a solution that can provide a better worst case run time than the recursive one — Joshua Byer, Apr 06 '15 at 00:36
It's an interesting question, what you can infer from the configuration of the lowest level of leaf nodes on the left side about the configuration of the adjacent right side (or vice versa). I think it's possible to do something here, but it's not trivial, and I don't know that there's a standard algorithm for it. I shall have to meditate on the question. — Wintermute, Apr 06 '15 at 00:36
best case we can keep going up on and checking if there is no right — Joshua Byer, Apr 06 '15 at 00:36
I'm thinking along these lines: There's a lowest level of leaf nodes that can be known by the leftmost and lowest node. The tree does not end more than one node higher than it anywhere. The slots for nodes on this level are a large bitmask -- node there is 1, node not there is 0. Knowing that mask for the n leftmost leaf nodes and how many bits in it are set allows you to infer that at least one less than that nodes in the adjacent n leaf-node spots exist, and because of the condition, the only one that is in question is the slot that corresponds with the lowest set bit in the part of the — Wintermute, Apr 06 '15 at 00:44
Are you able to call the bottom nodes straight away, or do you have to know the value of them? — Hayden, Apr 06 '15 at 00:44
tree that we already know (otherwise some subtree would be heavier on the right side, violating the condition). So you check that, and that allows you to move in steps that double in size. — Wintermute, Apr 06 '15 at 00:44
Hmm...it needs more work, but I think my track is the right one. The bitmask for the n leftmost leaf nodes of the tree differs from that for the adjacent n leaf nodes in at most one place, and the place where it can differ can be computed in advance (since any number of leaf nodes implies exactly one leaf node configuration). The trick is to figure out a performant algorithm to compute that place in advance without holding the bitmask itself in memory (it would be much too large for a tree of this size), just from the number of extant nodes in the n leftmost slots. — Wintermute, Apr 06 '15 at 01:01
a rough and simple idea,create a helper data struct,make it can directly tell us how many nodes in the tree.It may cost more memory,but we could use it for query many times...Just use space save time. — Ron Tang, Apr 06 '15 at 03:55

Wintermute · Answer 1 · 2015-04-06T18:15:46.707

I finally puzzled it out.

From the condition

It is guaranteed that for every node E that the left subtree will have as many or one more node than the right subtree at that node E.

it follows that

The number of non-leaf nodes can be calculated from the depth of the tree; it is 2^{depth - 1}. Therefore the interesting thing to count are the leaf nodes.
Given the balancing condition, there is always only one place where a new node can be inserted or an existing one removed. (This means that a given number of leaf nodes implies one, and only one, pattern of leaf nodes.)
If we know the number of leaf nodes of the left subtree of a node, we know that the number of leaf nodes (and the number of nodes) in the right subtree is either the same or one less than that.
It follows from 2. and 3. that there is only one leaf-node slot in the right subtree of which we can't know without inspecting the tree whether it is filled. Finding it is the trick in this algorithm.

So, making use of 3: Assume that we have a (sub)tree T. We know the number of leaf nodes in its left subtree is n_left. We know therefore that the number of leaf nodes in its right subtree is either n_left or n_left - 1, and in particular that it is at most n_left.

We step into the right subtree. Knowing the maximum number of leaf nodes in this subtree, and knowing that they are evenly split among the subtrees on both sides, we can infer two things:

If the maximum number of leaf nodes in this subtree is odd, then the questionable slot is on the left, since the right side cannot be heavier than the left. If it is even, then the slot is on the right
The maximum number of leaf nodes in each subsubtree is half that of the leaf nodes in the subtree, rounded up on the left, rounded down on the right.

That solves the heart of the matter; the rest is simple recursion. In C++:

#include <cstddef>

// I'm using a simple node structure, you'd use query functions. The
// algorithm is not meaningfully altered by this.
struct node {
  node *left = nullptr, *right = nullptr;
};

struct node_counter {
  std::size_t leaf;      // number of leaf nodes,
  std::size_t trunk;     // number of trunk nodes,
  std::size_t depth;     // and depth of the inspected subtree.
};

// Interesting function #1: Given a right subtree and the leaf-count and
// depth of its left sibling, find the node that might or might not be there
node const *find_leaf(node const *branch, std::size_t leaf_count, std::size_t depth) {
  // We've gone down, found the slot. Return it.
  if(depth == 0) { return branch; }

  // The heart of the matter: Step into the subtree that contains the
  // questionable slot, with its maximum leaf node count and depth.
  return find_leaf(leaf_count % 2 ? branch->left : branch->right,
                   (leaf_count + 1) / 2, // int division
                   depth - 1);
}

// Recursive counter. This steps down on the left side, then infers the
// number of leaf and trunk nodes on the right side for each level.
node_counter count_nodes_aux(node const *root) {
  // leftmost leaf node is reached. Return info for it.
  if(!root->left) {
    return { 1, 0, 0 };
  }

  // We're in the middle of the tree. Get the counts for the left side,
  auto ctr_left   = count_nodes_aux(root->left);

  // then find the questionable slot on the right
  auto leaf_right = find_leaf(root->right, ctr_left.leaf, ctr_left.depth);

  return {
    // the number of leaf nodes in this tree is double that of the left
    // subtree if the node is there, one less otherwise.
    ctr_left.leaf * 2 - (leaf_right ? 0 : 1),

    // And this is just an easy way to keep count of the number of non-leaf
    // nodes and the depth of the inspected subtree.
    ctr_left.trunk * 2 + 1,
    ctr_left.depth + 1
  };
}

// Frontend function to make the whole thing easily usable.
std::size_t count_nodes(node const *root) {
  auto ctr = count_nodes_aux(root);
  return ctr.leaf + ctr.trunk;
}

To try this out, I have used the following, exceedingly ugly main function that just builds a tree with many nodes, inserts new ones in the right places and checks if the counter moves in the right way. It is not pretty, it does not follow best practices, and if you write code like this in production, you ought to be fired. It is the way it is because the main point of this answer is the above algorithm, and I didn't see any sense in making this pretty.

void fill_node(node *n) {
  n->left  = new node;
  n->right = new node;
}

int main() {
  node *root = new node;

  fill_node(root);

  fill_node(root->left);
  fill_node(root->right);

  fill_node(root->left->left);
  fill_node(root->left->right);
  fill_node(root->right->left);
  fill_node(root->right->right);

  fill_node(root->left->left->left);
  fill_node(root->left->left->right);
  fill_node(root->left->right->left);
  fill_node(root->left->right->right);
  fill_node(root->right->left->left);
  fill_node(root->right->left->right);
  fill_node(root->right->right->left);
  fill_node(root->right->right->right);

  std::cout << count_nodes(root) << std::endl;

  root->left ->left ->left ->left ->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->left ->left ->left ->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->right->left ->left ->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->right->left ->left ->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->left ->right->left ->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->left ->right->left ->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->right->right->left ->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->right->right->left ->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->left ->left ->right->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->left ->left ->right->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->right->left ->right->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->right->left ->right->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->left ->right->right->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->left ->right->right->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->right->right->right->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->right->right->right->left  = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->left ->left ->left ->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->left ->left ->left ->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->right->left ->left ->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->right->left ->left ->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->left ->right->left ->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->left ->right->left ->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->right->right->left ->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->right->right->left ->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->left ->left ->right->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->left ->left ->right->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->right->left ->right->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->right->left ->right->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->left ->right->right->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->left ->right->right->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->left ->right->right->right->right = new node;  std::cout << count_nodes(root) << std::endl;
  root->right->right->right->right->right = new node;  std::cout << count_nodes(root) << std::endl;
}

"there is always only one place where a new node can be inserted or an existing one removed" → You seem to have a nonstandard definition of "balanced tree". See http://stackoverflow.com/questions/8015630/definition-of-a-balanced-tree — Veedrac, Apr 06 '15 at 17:07
@Veedrac The question is asking about a particular (left-heavy) sort of balanced tree. — Wintermute, Apr 06 '15 at 18:01

Joshua Byer · Answer 2 · 2015-04-06T00:37:46.090

-1

int countnodes(ele,count)
{
 if(ele.right != null)
   {
      count += countnodes(ele.right,0);
   }
  if(ele.left != null)
  {
     count += countnodes(ele.left,0);
  }
  return count++; //got to count this node
}

edited Apr 06 '15 at 00:37

answered Apr 06 '15 at 00:29

Joshua Byer

524
4
11

I do not have explicit access to the elements. I can only query for left and right and because the number of possible nodes is so large I can not store the tree. I do not think this solution will work. – user3188300 Apr 06 '15 at 00:30
you probably want to review your code since this does not return the correct value, even if the OP had access to the tree elements. – SleuthEye Apr 06 '15 at 00:35

Number of Nodes in a Balanced Tree

2 Answers2