Find longest subsequence s of String in a dictionary

Question

Find the longest subsequence s of a String such as "abccdde" and given a dictionary {"ab","add","aced"} . the result of above example is "add"

I was asked in an interview, I gave an answer using trie tree and worst case is O(n*m) ,and n is length of s , m is length of dictionary. But my average cost should be very low. I fail the interview because the interviewer thought my solution was not the best one. Does anyone have a better idea?

Can you provide a few more details about the constraints of the problem and your solution? If you store the *dictionary* in a trie, your complexity should be `O(n*k)` (where `k` is the length of the starting String), not `O(n*m)`. But that would be bad if you only query one word per dictionary. — Kittsil, Oct 20 '16 at 20:27
Thanks. My solution uses trie tree to store the dictionary. But I add a tag to every tree node, to show whether we have a subsequence of prefix of this node in certain prefix of the String. — Trumen, Oct 21 '16 at 05:01
For "abdc" and dictionary is {"abc","adc"}. the trie tree is "a->(b->c,d->c)" . Initially ,all nodes's tags are set to false. I use a Hashmap to store all the nodes we are interested next time, of cause at the beginning, map is {'a':"a node"}, I start to traverse the abcd, when I find a, I will set "a node" to true . Add "b node" and "d node" to the map, I use a dynamic programming to traverse the string and update every node's state. The longest "true" node will be the solution. — Trumen, Oct 21 '16 at 07:34
I sunddenly find time is not O(m*n), but O(n+A), A is the length of all the valid subsequence in the string, because I only traverse every valid node once. — Trumen, Oct 21 '16 at 07:34

user3145102 · Answer 1 · 2016-12-17T17:39:37.227

You can create a graph, then the vertices are your alphabet. For each word in your dictionary you'll add the first character in your graph something like:

G[word[0]].add({word, 0})

Then when you're visiting your text for each letter you visit the adyacency list for that letter. For each item in your list, you should add the next character for that word.

With your example:

S = "abccdde", D = {"ab","add","aced"}

First step:

G = {{'a', [{"ab", 0}, {"add", 0}, {"aced", 0}]}}

For each character in S

character = 'a' -> S[0]

You visit the list for that character

[{"ab", 0}, {"add", 0}, {"aced", 0}]

and update your graph

G = {{'b', [{"ab", 1}]}, {'d', ["add", 1]}, {'c', [{"aced", 1}]}}

character 'b' -> S[1]

You visit the list for that character

[{"ab", 1}]

and update your graph

G = {{'d', ["add", 1]}, {'c', [{"aced", 1}]}}

as you finished "ab" you can try to improve your answer.

character 'c' -> S[2]

You visit the list for that character

[{"aced", 1}]

and update your graph

G = {{'d', ["add", 1]}, {'e', [{"aced", 2}]}}

character 'c' -> S[3]

There is not list for that character then you continue with the next character

character 'd' -> S[4]

You visit the list for that character

["add", 1]

and update your graph

G = {{'d', ["add", 2]}, {'e', [{"aced", 2}]}}

...

score 0 · Answer 2 · answered Oct 21 '16 at 09:13

0

You could use this method

public static Boolean IsSubsequence(string ch, string item)
{
    if (ch.Length < item.Length)
    {
        return false;
    }
    int indexItem = 0;
    int indexCh = 0;
    while (indexCh < ch.Length && indexItem< item.Length)
    {
        if (ch[indexCh] == item[indexItem])
        {
            indexItem++;
        }
        indexCh++;
    }
    return indexItem == item.Length;
}

It is o(n) method You could also start by sorting dictionary items by word lenght so the first one that return true will be the result

answered Oct 21 '16 at 09:13

AnotherGeek

874
1
6
24

Are you sure it's O(n)? I need to compare every item, so your method should be O((n+l)*m) m is the length of dictionary, l is the length of the dictionary item – Trumen Oct 22 '16 at 01:44
I meant the function itself is o(n) because in the while loop I always increment indexCh so the code will be executed n time maximun (where n is the length of ch). So if you have m string in the dictionary the complexity will be O(n*m) – AnotherGeek Oct 24 '16 at 07:09

sh1 · Answer 3 · 2016-12-17T18:19:29.140

Arrange your dictionary data structure such that for each valid letter you advance deeper into the tree (looking at words one letter longer), with the addition of a default case (no matching letter with this prefix) pointing to the longest valid prefix shorter than the current depth (and also a flag saying that you already have a word, just in case that turns out to be your best option).

When you get a miss on 'asparag' followed by 'r', your dictionary directs you to the tree for 'sparag' iff there are any such words, and if not then it directs you to 'parag'.

For each failure you repeat the test and recurse to a shorter word if there's still no match; so this is still worse than O(n)... although a moment of casual thought suggests worst case might be O(2n).

To speed that up the default could be a list of defaults from which you pick the entry matching the current letter. All entries will match with at least an entry of length 0 (current letter starts no words) or 1 (just the current letter).

Jason S · Answer 4 · 2017-10-18T18:18:38.620

Here is my Python code for a solution with time complexity O(N + L), where N is the number of characters in the string (N = 7 for "abccdde"), and L is the total number of characters in the dictionary (L = 9 for {"ab","add","aced"}). Basically, it is a linear time complexity.

def find_longest_word_in_string(string, words):
    m = {}

    for word in words:
        m[word] = 0

    for c in string:
        for word in m.keys():
            if len(word) == m[word]:
                continue
            else:
                if word[m[word]] == c:
                    m[word] += 1

    length = 0
    for word in m.keys():
        if m[word] == len(word) and m[word] > length:
            res = word
            length = m[word]

    return res

if __name__ == '__main__':
   s = "abccdde"                 
   words = ["ab","add","aced"]    
   print find_longest_word_in_string(s, words)

run it, return 'add'

you are using two for loops doesn't that make it O(n^2) solution ? — ravi tanwar, Aug 07 '18 at 05:13

score 0 · Answer 5 · answered May 25 '18 at 19:02

I think it is not possible to use only one loop to reduce the time the algorithm can take, at least it needs two loop I guess, It is my approach:

public String lookFor(String inputWord, String[] dictionary) {

  Arrays.sort(dictionary);

  for (int index = dictionary.length - 1; index > 0; index--) {
    if (isTheWordASubsequence(inputWord, dictionary[index]))
      return dictionary[index];
  }

  return null;
}

private boolean isTheWordASubsequence(String inputWord,
    String dictionaryWord) {

  int spot = 0;
  int offset = 0;

  for (char item : dictionaryWord.toCharArray()) {
    spot = (offset = inputWord.indexOf(item, spot)) >= spot ? offset : -1;
    if (spot < 0)
      return false;
  }

  return true;
}

Find longest subsequence s of String in a dictionary

5 Answers5