1

I have started working on some algorithm problems when I saw a problem asking if I can find the longest word from a string (string does not have spaces just characters). After thinking for some time, I just wanted to confirm if I can use Dynamic Programming for this issue similar to Maximum contiguous sum problem. Here after parsing every character I can call isWord method (already implemented) and then if it is keep going to the next character and increase the word length, if its not then simply reset the counter to zero and start looking for a word from that index. Please let me know if that would be a good approach otherwise please guide me what would be better approach to solve this.

Thanks for your help guys.

-Vik

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Vik Singh
  • 1,563
  • 2
  • 19
  • 36
  • I think your approach would work. I think you'll get a lot more help here though if you provide sample strings that you are parsing (it helps us visualize exactly what you're doing.) Also you could mention what language you're doing it in and what restrictions you have (are you allowed to use regex for example). – Uncle Iroh Jun 29 '12 at 21:37

2 Answers2

2

This algorithm will not work correctly. Consider the following string:

BENDOCRINE

If you start from the start of the string and scan forward while you still have a word, you will find the word "BEND," then reset the string after that point and pick up from the O. The correct answer here is instead to pick the word "ENDOCRINE," which is much longer.

If you have a static dictionary and want to find the longest word from that dictionary that is contained within a text string, you might want to look at the Aho-Corasick algorithm, which will find every single match of a set of strings inside a text string, and does so extremely efficiently. You could easily modify the algorithm so that it tracks the longest word it has outputted at any time so that it does not output shorter strings than the longest one found so far, in which case the runtime will be O(n + m), where n is the length of your text string to search and m is the total number of characters in all legal English words. Moreover, if you do O(m) preprocessing in advance, from that point forward you can find the longest word in a given string in time O(n), where n is the number of characters in the string.

(As for why it runs in time O(n + m): normally the runtime is O(n + m + z), where z is the number of matches. If you restrict the number of matches outputted so that you never output a shorter word than the longest so far, there can be at most n words outputted. Thus the runtime is O(n + m + n) = O(n + m)).

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • thanks a lot for the reply and I see how this algorithm fails in that case. I am going to check Aho-Corasick algorithm. also while thinking about algorithms initially, I had Tries in mind too. do you think trie could be helpful here? or does it have some limitations too? – Vik Singh Jun 29 '12 at 21:49
  • The algorithm @templatetypedef suggested will be much more efficient than building a trie from the dictionary, and using it to look for matches at each position. – Alex D Jun 29 '12 at 21:52
  • @user301214- You can think of the Aho-Corasick algorithm as an optimized trie-based algorithm that is designed to operate efficiently one character at a time. In the absence of this algorithm, using a standard trie would be reasonably good, though not ideal. – templatetypedef Jun 29 '12 at 22:29
0

Dynamic programming will not work for your problem:

let seq1 and seq2 be 2 character sequences

isWord(Concatenation(seq1, seq2)) cannot be infered from the values of isWord(seq1) and isWord(seq2)

Benoit
  • 1,995
  • 1
  • 13
  • 18