1

I have an algorithmic problem where I have a number of Unordered Sets of elements, and I need to find the shortest path (Ordered combination of the sets) that pass through all of those sets. There may be thousands of sets.

For example, let there be the following 4 unordered sets:
A=abcdefg
B=cd
C=abch
D=defi

The shortest path size is 11.

One possible solution is:
P=CADB=habcgdeficd
|P|=11

Note that sets may share elements with neighboring sets in the path!
There may also be duplicated elements belonging to different sets (as in the example above: 'c' and 'd' are duplicated in P, by adding B to CAD).

Please advise with an algorithm to find the shortest path as described.
Thanks!

ekatz
  • 11
  • 2
  • Interesting as it is, I think this question would perhaps be better suited for MathExchange (or even MathOverflow). Also, I doubt that there exists an efficient solution for this problem. – MinosIllyrien Feb 14 '19 at 13:36
  • 1
    I tend to think this the solution will be non-polynomial. Let's say you can answer the question of "How many elements those sets have in common" in `O(1)` using some precalculation. Now put this every set in a graph and you have a `click` in which every edge is this query. You need to find a path that is going over all of the vertices and have maximum of intersections. I'm not sure but I think it's `NP hard`. – Yonlif Feb 14 '19 at 19:05

2 Answers2

0

You have a graph:

  • the node are the sets;
  • the edge A-B exists if A and B have an intersection but are not subset one of another;
  • if the edge A-B exists, the distance A-B is the the size of A union B.

You are looking for the shortest path that covers all nodes. That is a variant of the travelling salesman problem without the need to go back to the start.

Some reading: What is the problem name for Traveling salesman problem(TSP) without considering going back to starting point?

EDIT: I try to summarize what was discussed in the comments and my answers.

  1. What was not clear in the question is: what will you do if a set is a superset of another ? I assumed that you wanted to separate those two sets, that's why I wrote : 'the edge A-B exists if A and B have an intersection but are not subset one of another'. For the TSP, just use a infinite distance between the sets A and B if the edge does not exists. That applies to subsets/supersets.

  2. The path is ordered (by definition of a path), but the sets are unordered. That's why this is not a (trivial) variation of the Shortest common superstring problem. A string is ordered, a set no.

  3. The TSP idea doesn't work either well with the distance defined above, because:

    • the definition of the distance is not good: the distance should strictly decrease when the intersection grows. A solution would be max(len(S)) - len(A ^ B).
    • more important : you are not allowed to use the same letters at "both sides" of the set. E.g. "abc" can't be at a distance 1 from "bcd" and a distance 2 from "eb", because if you choose the path "a-bc-d", then the edge "abc" - "eb" doesn't exist anymore. Maybe a greedy choice would do the trick, but I'm not sure.
jferard
  • 7,835
  • 2
  • 22
  • 35
  • The problem starts when you have to include a set that fully contains another. A simple TSP won’t suffice. Though, the answer to Shortest Common String, is pretty close to TSP. – ekatz Feb 20 '19 at 12:48
  • @ekatz As you can see, I wrote: "the edge A-B exists if A and B have an intersection but are not subset one of another". This condition forces the subsets/supersets to be separated. – jferard Feb 20 '19 at 17:23
  • As I wrote: "... I need to find the shortest path (**Ordered** combination of the sets) that pass through **all** of those sets." So in your solution I might have a final Ordered path from which a certain set will not be located in. – ekatz Feb 21 '19 at 08:27
0

This question can be reduced to a variation of the Shortest common superstring problem

ekatz
  • 11
  • 2
  • I your example, `abcdefghi` is a superstring of A,B,C,D. Try `>>> A,B,C,D=map(set, ["abcdefg", "cd", "abch", "defi"]) >>> all(set("abcdefghi") >= S for S in [A, B, C, D]) True`. But `abcdefghi` does not seem to be a real path (A contains B). – jferard Feb 18 '19 at 18:41
  • The string ‘abcdefghi’ is not a superstring of A,B,C,D, as there is no string which is a variation of the set C, that is contained within this string. – ekatz Feb 20 '19 at 12:55
  • My mistake: a **superset**. There was no mention of order in your first post, thus the superstring is the solution of another problem. – jferard Feb 20 '19 at 17:21
  • On the first sentence I wrote: "... the shortest path (Ordered combination of the sets) ..." – ekatz Feb 21 '19 at 08:21