I came across this problem in an unrelated program I'm writing and I've spent quite a few hours trying to solve it because I though it would be fun. It was but I was unable to do it all the way. My code only solves the sequence of some subsets. This problem also feels like a general math problem that has probably been solved in a wide variety of ways over the decades but I lack the mathematical skills and terminology to find a solution or indeed anything about this specific problem online.
I have a set of subsequences that I know to be a part of a larger, unknown (super?)sequence. I don't think these subsequences are sets in the mathematical sense because they are ordered but they are similar in that they do not contain duplicate elements. Same goes for the master/super/whateversequence. (For clarity I will be referring to this as supersequence.)
The subsequences all contain the same type of data, however the data is not ordered alphabetically, in an ascending order, or anything like that. In a sense the data is in an arbitrary order: in that of the supersequence. And that is what I am interested in. I want to find the unknown supersequence of these subsequences.
For the sake of simplicity I've tried to solve this problem using letters of the alphabet, but I can later refactor the code to suit my needs. Obviously because I'm still trying to solve this problem I started by coming up with a suitable word for supersequence that does not contain duplicate elements: FLOWCHARTS.
Then I came up with the following six subsequences:
F,W,C,R
L,H,A
L,O,H,A,R,S
C,S
R,T,S
F,O,W,H,A,S
Here's my sequence ordering method:
// LinkedHashMappedKeyValueList keeps the data in the order it was inserted and allows one key to have multiple values.
private static LinkedHashSet<Character> orderSequence(final Set<Character> unorderedSequence, final LinkedHashMappedKeyValueList ruleMap)
{
List<Character> orderedSequence = new ArrayList<Character>(unorderedSequence);
// Order the sequence according to the rules.
System.out.println("---- ORDERING SEQUENCE ----");
for (Map.Entry<Character, LinkedHashSet<Character>> rule : ruleMap.entrySet())
{
char currentChar = rule.getKey();
LinkedHashSet<Character> ruleChars = rule.getValue();
System.out.println("Processing rule " + currentChar + "<" + ruleChars.toString());
if (orderedSequence.contains(currentChar))
{
int ruleCharIndex = -1;
int smallestRuleCharIndex = Integer.MAX_VALUE;
Iterator<Character> it = ruleChars.iterator();
// Find the rule character with the smallest index.
while (it.hasNext())
{
char ruleChar = it.next();
ruleCharIndex = orderedSequence.indexOf(ruleChar);
System.out.println("\tChecking for rule character: " + ruleChar + " (" + ruleCharIndex + ")");
if (ruleCharIndex > -1 && smallestRuleCharIndex > ruleCharIndex)
smallestRuleCharIndex = ruleCharIndex;
}
if (smallestRuleCharIndex != Integer.MAX_VALUE)
System.out.println("\tMoving '" + currentChar + "' before '"
+ orderedSequence.get(smallestRuleCharIndex) + "'.");
else
System.out.println("\tMoving '" + currentChar + "' to the end of the sequence.");
if (!moveBefore(orderedSequence.indexOf(currentChar), smallestRuleCharIndex, orderedSequence))
System.out.println("\tAlready in correct position.");
else
System.out.println("\tCurrent sequence: " + listToString(orderedSequence));
}
else
throw new ArithmeticException("Element of a subsequence not a part of the sequence.");
}
return new LinkedHashSet<Character>(orderedSequence);
}
In the end my code found the supersequence F,L,O,W,H,C,A,R,T,S
for these subsequences which is pretty close but not perfect. I also need to run my ordering method multiple times so the "algorithm" I came up with isn't perfect either. The "rule map" thing is a hash map where the key is another hash map of Character objects that came after the key Character in the subsequences (and thus in the supersequence).
Is there a Java library of some sort I could use that does this kind of sequence finding? Can someone point me in the right direction in terms of telling me what this is called and/or helping me find the right algorithms for the job?
Additionally, a shortened console output of my program:
---- BUILDING RULE MAP ----
Subsequences: F,W,C,R
L,H,A
L,O,H,A,R,S
C,S
R,T,S
F,O,W,H,A,S
All subsequences processed. Number of ordering rules: 10
Rule map: (F<[W, O]),(W<[C, H]),(C<[R, S]),(R<[, S, T]),(L<[H, O]),(H<[A]),(A<[, R, S]),(O<[H, W]),(S<[]),(T<[S])
---- BUILDING UNORDERED SEQUENCE ----
Sequence size is 10.
Unordered sequence: F,W,C,R,L,H,A,O,S,T
---- ORDERING SEQUENCE ----
Processing rule F<[W, O]
Moving 'F' before 'W'.
Already in correct position.
Processing rule W<[C, H]
Moving 'W' before 'C'.
Already in correct position.
Processing rule C<[R, S]
Moving 'C' before 'R'.
Already in correct position.
Processing rule R<[, S, T]
Moving 'R' before 'S'.
Current sequence: F,W,C,L,H,A,O,R,S,T
Processing rule L<[H, O]
Moving 'L' before 'H'.
Already in correct position.
Processing rule H<[A]
Moving 'H' before 'A'.
Already in correct position.
Processing rule A<[, R, S]
Moving 'A' before 'R'.
Current sequence: F,W,C,L,H,O,A,R,S,T
Processing rule O<[H, W]
Moving 'O' before 'W'.
Current sequence: F,O,W,C,L,H,A,R,S,T
Processing rule S<[]
Moving 'S' to the end of the sequence.
Current sequence: F,O,W,C,L,H,A,R,T,S
Processing rule T<[S]
Moving 'T' before 'S'.
Already in correct position.
Previous sequence: F,W,C,R,L,H,A,O,S,T
Ordered sequence: F,O,W,C,L,H,A,R,T,S
Sequences match: false
---- ORDERING SEQUENCE ----
Processing rule F<[W, O]
Moving 'F' before 'O'.
Already in correct position.
Processing rule W<[C, H]
Moving 'W' before 'C'.
Already in correct position.
Processing rule C<[R, S]
Moving 'C' before 'R'.
Current sequence: F,O,W,L,H,A,C,R,T,S
Processing rule R<[, S, T]
Moving 'R' before 'T'.
Already in correct position.
Processing rule L<[H, O]
Moving 'L' before 'O'.
Current sequence: F,L,O,W,H,A,C,R,T,S
Processing rule H<[A]
Moving 'H' before 'A'.
Already in correct position.
Processing rule A<[, R, S]
Moving 'A' before 'R'.
Current sequence: F,L,O,W,H,C,A,R,T,S
Processing rule O<[H, W]
Moving 'O' before 'W'.
Already in correct position.
Processing rule S<[]
Moving 'S' to the end of the sequence.
Already in correct position.
Processing rule T<[S]
Moving 'T' before 'S'.
Already in correct position.
Previous sequence: F,O,W,C,L,H,A,R,T,S
Ordered sequence: F,L,O,W,H,C,A,R,T,S
Sequences match: false
---- ORDERING SEQUENCE ----
Processing rule F<[W, O]
Moving 'F' before 'O'.
Current sequence: L,F,O,W,H,C,A,R,T,S
Processing rule W<[C, H]
Moving 'W' before 'H'.
Already in correct position.
Processing rule C<[R, S]
Moving 'C' before 'R'.
Current sequence: L,F,O,W,H,A,C,R,T,S
Processing rule R<[, S, T]
Moving 'R' before 'T'.
Already in correct position.
Processing rule L<[H, O]
Moving 'L' before 'O'.
Current sequence: F,L,O,W,H,A,C,R,T,S
Processing rule H<[A]
Moving 'H' before 'A'.
Already in correct position.
Processing rule A<[, R, S]
Moving 'A' before 'R'.
Current sequence: F,L,O,W,H,C,A,R,T,S
Processing rule O<[H, W]
Moving 'O' before 'W'.
Already in correct position.
Processing rule S<[]
Moving 'S' to the end of the sequence.
Already in correct position.
Processing rule T<[S]
Moving 'T' before 'S'.
Already in correct position.
Previous sequence: F,L,O,W,H,C,A,R,T,S
Ordered sequence: F,L,O,W,H,C,A,R,T,S
Sequences match: true
Sequence ordered according to the limits of the rule map.
Sequence found after 2 tries.
Expected sequence: F,L,O,W,C,H,A,R,T,S FLOWCHARTS
Found sequence: F,L,O,W,H,C,A,R,T,S FLOWHCARTS
Sequences match: false