Counting p-cousins on a directed tree

Question

We're given a directed tree to work with. We define the concepts of p-ancestor and p-cousin as follows

p-ancestor: A node is an 1-ancestor of another if it is the parent of it. It is the p-ancestor of a node, if it is the parent of the (p-1)-th ancestor.

p-cousin: A node is the p-cousin of another, if they share the same p-ancestor.

For example, consider the tree below.

4 has three 1-cousins i,e, 3, 4 and 5 since they all share the common 1-ancestor, which is 1

For a particular tree, the problem is as follows. You are given multiple pairs of (node,p) and are supposed to count (and output) the number of p-cousins of the corresponding nodes.

A slow algorithm would be to crawl up to the p-ancestor and run a BFS for each node.

What is the (asymptotically) fastest way to solve the problem?

How are the nodes stored? Is the only information you have for one node its ancestor? — rst, Feb 17 '16 at 07:56
Do you need to output just the *number of* p-cousins, or the actual p-cousins themselves? If the latter, then the only thing stopping the "crawl up to the p-ancestor and run a BFS" from having optimal time complexity is the possibility of single-child nodes: if every node has >= 2 children, then there must always be >= p p-cousins to output and <= p internal nodes to visit during the BFS, so the costs of crawling up the tree and visiting internal nodes during the BFS can both be amortised across the costs of outputting the results. — j_random_hacker, Feb 17 '16 at 14:45

Pham Trung · Answer 1 · 2016-02-17T08:54:17.457

If an off-line solution is acceptable, two Depth first searches can do the job.

Assume that we can index all of those n queries (node, p) from 0 to n - 1

We can convert each query (node, p) into another type of query (ancestor , p) as follow:

Answer for query (node, p), with node has level a (distance from root to this node is a), is the number of descendants level a of the ancestor at level a - p. So, for each queries, we can find who is that ancestor:

Pseudo code

dfs(int node, int level, int[]path, int[] ancestorForQuery, List<Query>[]data){
    path[level] = node;
    visit all child node;
    for(Query query : data[node])
       if(query.p <= level)
          ancestorForQuery[query.index] = path[level - p];
}

Now, after the first DFS, instead of the original query, we have a new type of query (ancestor, p)

Assume that we have an array count, which at index i stores the number of node which has level i. Assume that, node a at level x , we need to count number of p descendants, so, the result for this query is:

query result = count[x + p] after we visit a -  count[x + p] before we visit a

Pseudo code

dfs2(int node, int level, int[] result, int[]count, List<TransformedQuery>[]data){
   count[level] ++;
   for(TransformedQuery query : data[node]){
         result[query.index] -= count[level + query.p];
   }
   visit all child node;
   for(TransformedQuery query : data[node]){
         result[query.index] += count[level + query.p];
   }
}

Result of each query is stored in result array.

score 0 · Answer 2 · answered Feb 17 '16 at 08:39

If p is fixed, I suggest the following algorithm:

Let's say that count[v] is number of p-children of v. Initially all count[v] are set to 0. And pparent[v] is p-parent of v.

Let's now run a dfs on the tree and keep the stack of visited nodes, i.e. when we visit some v, we put it into the stack. Once we leave v, we pop.

Suppose we've come to some node v in our dfs. Let's do count[stack[size - p]]++, indicating that we are a p-child of v. Also pparent[v] = stack[size-p]

Once your dfs is finished, you can calculate the desired number of p-cousins of v like this: count[pparent[v]]

The complexity of this is O(n + m) for dfs and O(1) for each query

score 0 · Answer 3 · answered Feb 22 '16 at 16:16

First I'll describe a fairly simple way to answer each query in O(p) time that uses O(n) preprocessing time and space, and then mention a way that query times can be sped up to O(log p) time for a factor of just O(log n) extra preprocessing time and space.

O(p)-time query algorithm

The basic idea is that if we write out the sequence of nodes visited during a DFS traversal of the tree in such a way that every node is written out at a vertical position corresponding to its level in the tree, then the set of p-cousins of a node form a horizontal interval in this diagram. Note that this "writing out" looks very much like a typical tree diagram, except without lines connecting nodes, and (if a postorder traversal is used; preorder would be just as good) parent nodes always appearing to the right of their children. So given a query (v, p), what we will do is essentially:

Find the p-th ancestor u of the given node v. Naively this takes O(p) time.
Find the p-th left-descendant l of u -- that is, the node you reach after repeating the process of visiting the leftmost child of the current node, p times. Naively this takes O(p) time.
Find the p-th right-descendant r of u (defined similarly). Naively this takes O(p) time.
Return the value x[r] - x[l] + 1, where x[i] is a precalculated value that records the number of nodes in the sequence described above that are at the same level as, and at or to the left of, node i. This takes constant time.

The preprocessing step is where we calculate x[i], for each 1 <= i <= n. This is accomplished by performing a DFS that builds up a second array y[] that records the number y[d] of nodes visited so far at depth d. Specifically, y[d] is initially 0 for each d; during the DFS, when we visit a node v at depth d, we simply increment y[d] and then set x[v] = y[d].

O(log p)-time query algorithm

The above algorithm should already be fast enough if the tree is fairly balanced -- but in the worst case, when each node has just a single child, O(p) = O(n). Notice that it is navigating up and down the tree in the first 3 of the above 4 steps that force O(p) time -- the last step takes constant time.

To fix this, we can add some extra pointers to make navigating up and down the tree faster. A simple and flexible way uses "pointer doubling": For each node v, we will store log2(depth(v)) pointers to successively higher ancestors. To populate these pointers, we perform log2(maxDepth) DFS iterations, where on the i-th iteration we set each node v's i-th ancestor pointer to its (i-1)-th ancestor's (i-1)-th ancestor: this takes just two pointer lookups per node per DFS. With these pointers, moving any distance p up the tree always takes at most log(p) jumps, because the distance can be reduced by at least half on each jump. The exact same procedure can be used to populate corresponding lists of pointers for "left-descendants" and "right-descendants" to speed up steps 2 and 3, respectively, to O(log p) time.

Counting p-cousins on a directed tree

3 Answers3

O(p)-time query algorithm

O(log p)-time query algorithm