1

If I want to find the longest common substring for 2 strings then which approach will be more efficient in terms of time/space complexity: using suffix arrays of DP?

DP will incur O(m*n) space with O(m*n) time complexity, what will be the time complexity of the suffix array approach?

1) Calculate the suffixes O(m) + O(n) 2) Sort them O(m+n log2(m+n)) 3) Finding longest common prefix for m+n-1 strings? [I'm not sure how to calculate #of comparisons]

Suffix arrays allow us to do many more things with the sub-strings (like search for sub-string etc.), but since in this case rest of the functions are not needed, will DP be considered an easier/cleaner approach?Which one should be used in the case where we are comparing 2 strings?

Also, what if we have more than 2 strings?

user1071840
  • 3,522
  • 9
  • 48
  • 74

1 Answers1

0

Suffix array would be better. The LCS(longest common substring for n strings) problem can be solve as below:

  1. Concatenate S1, S2, ..., Sn as follows: S = S1$1S2$2...$nSn, Here $i are special symbols (sentinels) that are different and lexicographically less than other symbols of the initial alphabet.
  2. Compute the suffix array. Generally, We implemented suffix array in O(n*log n) but there is an important algorithm called DC3 which computes suffix arrays in O(n), n is the total length of N strings. You can google this algorithm.
  3. Compute the LCP of all adjacent suffixes.
jfly
  • 7,715
  • 3
  • 35
  • 65
  • 1
    Can you please explain little more. I could do this with two strings and dc3 suffix array and lcp array. There I can check the largest lcp value such that i and i-1 should point to different strings. For 10 such strings, do I need to check continuous 10 such values in lcp array that each of them should point to different string? – Naman Jun 13 '15 at 23:42