2

Is there any known algorithm for building a tree from set of sets, using a minimum number of nodes?

For example I have the following sets:

1. {A, B, C}
2. {B, C}
3. {D, B, A}
4. {C, A}

Every set is represented by a path from root to a leaf. They are sets, so the order that the nodes appear in the path is not important.

I need a tree using as few nodes as possible that will represent all of the given sets as paths. One possible solution (not sure if it is minimal) is:

       0
     /   \
    C     A
   / \     \
  A   B     B
 / \   \     \
4   B   2     D
    |         |
    1         3

where the root node 0 is some empty element.

kaya3
  • 47,440
  • 4
  • 68
  • 97
Bojan
  • 39
  • 4
  • 2
    Have you thought of any solution yet? Even a brute force approach can help identify an optimal algorithm. – kiner_shah Feb 19 '20 at 13:23
  • @kaya3 there is no strong definition on optimal but let say minimum nodes. You are right about balanced, I actually need minimum number of nodes. I edited the question. – Bojan Feb 19 '20 at 13:47
  • @kiner_shah I think that brute force will be very complex, I need something smarter. – Bojan Feb 19 '20 at 13:49
  • 1
    Minimising the number of nodes smells like it should be NP-complete, but I'll have a think about it. – kaya3 Feb 19 '20 at 13:49
  • 2
    My first attempt would be brute force all possibilities. Than think about optimizations (if it is too slow) – MrSmith42 Feb 19 '20 at 14:51
  • Brute force is not an option. Complexity for brute force will be O(n!*m!) and I have ~50K sets with ~5 elements each. – Bojan Feb 19 '20 at 15:07
  • 3
    Note that a simple greedy algorithm can provide relatively good results. Not totally sure it is really optimum: 1st iteration select element present in most sets -> C. Then select element most present in remainig sets -> A for example. etc. 2nd iteration, branch on the left (under the C): reitere the process. However, here, I assume that branches cannot be further be reconnected. For example, 1st A on the right cannot connect to the B on the left – Damien Feb 19 '20 at 16:19
  • 1
    Is there a requirement that the tree be binary? – Dave Feb 19 '20 at 16:36
  • 1
    If the sets were drawn out as a graph, would the graph have to be connected, or can it be disconnected, e.g. 1. A, B, C 2. B, D, E. 3. F, G. These are not connected. – Guy Coder Feb 19 '20 at 18:34
  • 1
    Do the edges have to be unidirectional? Can the edges be directional? Can the edges have attributes? – Guy Coder Feb 19 '20 at 19:06
  • @Damien Already implement it without good results. – Bojan Feb 19 '20 at 21:03
  • @Dave it must not be binary. – Bojan Feb 19 '20 at 21:07
  • @GuyCoder It must not be a graph. No cycles are allowed. It must be a tree, and you can use the same node multiple times – Bojan Feb 19 '20 at 21:07
  • 1
    If you think about it, there is no reason you can not start with a graph. Depending upon how you traverse the graph you can end up with a tree. So why put a constraint on yourself. Also you did not answer the question in the comment. Can the final tree be multiple disjoint trees? – Guy Coder Feb 19 '20 at 21:20
  • 1
    Could you provide a rather simple example where the greedy algorithm doesn't provide a good result? – Damien Feb 19 '20 at 21:26
  • @GuyCoder If we have the following rules 1. A B 2. B C 3. C D 4. D A the graph will be the simple cycle, how to generate tree from this graph? Yes the final tree can be multiple disjoint trees. In the example that I provided there are two trees and I'm using null element as a major root. – Bojan Feb 19 '20 at 21:49
  • 1
    I think this problem is very similar to [trie](https://medium.com/basecs/trying-to-understand-tries-3ec6bede0014) data structure but I am not sure. – kiner_shah Feb 20 '20 at 03:19
  • 1
    @kiner_shah It is with one major difference, in the trie structure we are putting the words, which means there is an order of the letters. In my case instead words I have sets of letters and they can be in any order, the order is not important. – Bojan Feb 20 '20 at 07:56
  • 1
    @Damien 1. {A,B} 2. {A, C} 3. {B} 4. {C} If we choose A because of largest number of occurrences, on the first level we will have A, B and C. And that tree will have 10 nodes. But, if we choose just B and C for the first level, we will have tree with 9 nodes. – Bojan Feb 20 '20 at 11:26

0 Answers0