How to remove children of tree nodes with single child

Question

I have an array for pre-order traversal of a tree (node values are depth values). All I want to do is to minimize tree by removing the children of internal nodes having only one child.

As an example (a tree with max depth = 3) problem visualized here

Input array: [0, 1, 2, 3, 3, 1, 2, 3]
Output array: [0, 1, 2, 2, 1]

How should be the algorithm?

What have you tried? Hint: Can you recreate the tree from those information? — Pham Trung, Feb 02 '16 at 10:27
Sorry, I did not understand what you've meant. From output array, it is no possible to create input tree, but it is not my purpose anyway. Could you explain more please? — guezara, Feb 02 '16 at 11:02
The example doesn't match your description. You are deleting both children of the leftmost 2 node. — n. m. could be an AI, Feb 02 '16 at 11:10
I agree, you should clarify the question. I answered for the example. — kfx, Feb 02 '16 at 11:18
Actually, it shifts one level up (as I delete leftmost "1" node - the only node with single child). I do not delete the node in the 2nd level. Sorry for my possible bad explanation and English, could you help me how to explain such a problem? — guezara, Feb 02 '16 at 11:25

kfx · Accepted Answer · 2016-02-02T11:19:57.783

A simple O(nlog(n)) average-case algorithm arises from attacking the problem by divide-and-conquer approach.

Start with input_level = 0, output_level=0, left=0, right=n-1.

In each of the recursive steps, count the elements with value input_level+1 in the input array A in range [left, right]. These are the children of the current node. If there are no such elements, print output_level and return. If there is just one such element, "delete" the current node (i.e. do not print it), increase left by 1, and call the function recursively. If there are two or more such elements, print output_level, increase output_level by 1, and apply the function recursively to each of the intervals demarcated by the children elements. Always increase input_level when doing the recursive call.

For the example input A=[0, 1, 2, 3, 3, 1, 2, 3], first the algorithm would find elements with value 1, at indexes 1 and 5. Then it would print 0, increase output_level and current_level by 1, and call itself recursively two times: on ranges [1, 4] and [5, 7].

The complexity of this is O(n²) in the worst case (for the degenerate tree that is in fact a list), but O(nlog(n)) on the average, as a random n-ary tree has height O(log(n)).

Lingxi · Answer 2 · 2016-02-02T16:41:24.963

The following algorithm runs in O(N). I guess I get it correct this time.

#include <algorithm>
#include <iostream>
#include <stack>
#include <tuple>
#include <utility>
#include <vector>

void mark_nodes(const std::vector<unsigned>& tree,
                std::vector<bool>& mark) {
  // {depth, index, mark?}
  using triple = std::tuple<unsigned, unsigned, bool>;
  std::stack<triple> stk;
  stk.push({0, 0, false});
  for (auto i = 1u; i < mark.size(); ++i) {
    auto depth = tree[i];
    auto top_depth = std::get<0>(stk.top());
    if (depth == top_depth) {
      stk.pop();
      if (stk.size()) std::get<2>(stk.top()) = false;
      continue;
    }
    if (depth > top_depth) {
      std::get<2>(stk.top()) = true;
      stk.push({depth, i, false});
      continue;
    }
    while (std::get<0>(stk.top()) != depth) {
      mark[std::get<1>(stk.top())] = std::get<2>(stk.top());
      stk.pop();
    }
    mark[std::get<1>(stk.top())] = std::get<2>(stk.top());
    stk.pop();
    if (stk.size()) std::get<2>(stk.top()) = false;
    stk.push({depth, i, false});
  }
  mark[0] = false;
}

std::vector<unsigned> trim_single_child_nodes(
    std::vector<unsigned> tree) {
  tree.push_back(0u);
  std::vector<bool> mark(tree.size(), false);
  mark_nodes(tree, mark);
  std::vector<unsigned> ret(1, 0);
  tree.pop_back();
  mark.pop_back();
  auto max_depth = *std::max_element(tree.begin(), tree.end());
  std::vector<unsigned> depth_map(max_depth + 1, 0);
  for (auto i = 1u; i < tree.size(); ++i) {
    if (mark[i]) {
      if (tree[i] > tree[i - 1]) {
        depth_map[tree[i]] = depth_map[tree[i - 1]];
      }
    } else {
      if (tree[i] > tree[i - 1]) {
        depth_map[tree[i]] = depth_map[tree[i - 1]] + 1;
      }
      ret.push_back(depth_map[tree[i]]);
    }
  }
  return ret;
}

int main() {
  std::vector<unsigned> input = {0, 1, 2, 3, 3, 1, 2, 3};
  auto output = trim_single_child_nodes(input);
  for (auto depth : output) {
    std::cout << depth << ' ';
  }
}

it seems not working properly. i.e. {0,1,2,1,2} gives output {0,1,2,1}. it should be {0,1,1} instead. — guezara, Feb 02 '16 at 14:51
@guezara Yes. The case of leaf node seems difficult to handle. — Lingxi, Feb 02 '16 at 14:58
O(N) solution would be perfect for me @Lingxi, as my data is really huge but I did not understand your solution with all details. i.e. [0, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 4, 5, 2, 3, 4, 5, 1, 2, 3, 4, 5] input gives output [ 0, 1, 0, 1, 2, 2, 1, 1 ]. Instead it should be [ 0, 1, 1, 2, 2, 1, 1 ]. Can you guess why? Thanks — guezara, Feb 03 '16 at 09:12
@guezara `mark_nodes()` is most probably correct, which marks the nodes to be trimmed. Problem is how to use that information to compute the final result. I guess I can't push it any further myself. Sorry about that. I've tried but failed. Yet, I do believe a linear time algorithm exists. Gonna leave the answer open here anyway. Hopefully, more capable people would see it and make it complete. — Lingxi, Feb 03 '16 at 16:02

How to remove children of tree nodes with single child

2 Answers2