Given a large document and a short pattern consisting of a few words (eg. W1 W2 W3), find the shortest string that has all the words in any order (for eg. W2 foo bar dog W1 cat W3 -- is a valid pattern)
I structured the "large document" as a list of strings. I believe my solution is O(nlog(n)), but I'm not sure (I'm also not sure whether it's correct). Is there a faster way? Please note that the below is pseudocoded Java, so obviously will not compile, but I believe the message is clear:
main(){
List<String> wordsToCheckFor;
List<String> allWords;
int allWordsLength = allWords.length;
int minStringLength = POS_INFINITY;
List<String> minString;
//The idea here is to divide and conquer the string; I will first
//check the entire string, then the entire string minus the first
//word, then the entire string minus the first two words, and so on...
for(int x = 0; x < allWordsLength; x++){
if(checkString(allWords, wordsToCheckFor) && (allWords.length < minStringLength)){
minString = allWords;
minStringLength = allWords.length();
}
allWords.remove(0);
}
System.out.println(minString);
}
checkString(List<String> allWords, List<String> wordsToCheckFor){
boolean good = true;
foreach(String word : wordsToCheckFor){
if(!allWords.contains(word))
good = false;
}
return good;
}