0

Deterministic automata to find number of subsequences in string ? How can I construct a DFA to find number of occurence string as a subsequence in another string?

eg. In "ssstttrrriiinnngggg" we have 3 subsequences which form string "string" ?

also both string to be found and to be searched only contain characters from specific character Set . I have some idea about storing characters in stack poping them accordingly till we match , if dont match push again . Please tell DFA solution ?

io10
  • 45
  • 8
  • I suggest you write some code and run it. That's how algorithms are typically implemented. –  Jan 26 '14 at 17:14
  • You are describing a push-down DFA, which is not exactly a DFA, and can express a context-free language, while a 'classic' DFA can express only a regular language. – amit Jan 26 '14 at 17:16
  • @H2CO3 what i would code is will use some idea of turning machine tape , find first letter of string to be found in other string , then next letter reach at beginning again scan ...... – io10 Jan 26 '14 at 17:17
  • What i ask is that can we design DFA for such a problem , or not ? – io10 Jan 26 '14 at 17:18
  • Your question is not specific enough. Are you searching given subsequences, like from 1ab2cd3ef check whether it contains 123, are you after all subsequences? BTW: if you can write a regular expression (classic textbook, not PERL or Java) for you task, you are obviously done. – Harald Jan 26 '14 at 17:35
  • Deterministic finite automata cannot count, so there's no way to do it with a DFA. – n. m. could be an AI Jan 26 '14 at 17:37
  • Are you looking to count the number of overlapping or nonoverlapping subsequences? (Your example seems to suggest nonoverlapping?) – Peter de Rivaz Jan 26 '14 at 17:50
  • Sorry , i corrected my example , nonoverlapping subsequences – io10 Jan 26 '14 at 17:53
  • Well i got idea that DFA has finite memory so it cannot count , but what if we restrict such subsequences say to some max amount of (N/m) where N= |Largest String that can be given| and M=|String to be found| , also both string to be found and to be searched only contain characters from specific character Set – io10 Jan 26 '14 at 17:55

1 Answers1

0

OVERLAPPING MATCHES

If you wish to count the number of overlapping sequences then you simply construct a DFA that matches the string, e.g.

1 -(if see s)-> 2 -(if see t)-> 3 -(if see r)-> 4 -(if see i)-> 5 -(if see n)-> 6 -(if see g)-> 7

and then compute the number of ways of being in each state after seeing each character using dynamic programming. See the answers to this question for more details.

DP[a][b] = number of ways of being in state b after seeing the first a characters
         = DP[a-1][b] + DP[a-1][b-1] if character at position a is the one needed to take state b-1 to b
         = DP[a-1][b] otherwise

Start with DP[0][b]=0 for b>1 and DP[0][1]=1.

Then the total number of overlapping strings is DP[len(string)][7]

NON-OVERLAPPING MATCHES

If you are counting the number of non-overlapping sequences, then if we assume that the characters in the pattern to be matched are distinct, we can use a slight modification:

DP[a][b] = number of strings being in state b after seeing the first a characters
         = DP[a-1][b] + 1 if character at position a is the one needed to take state b-1 to b and  DP[a-1][b-1]>0
         = DP[a-1][b] - 1 if character at position a is the one needed to take state b to b+1 and DP[a-1][b]>0
         = DP[a-1][b] otherwise

Start with DP[0][b]=0 for b>1 and DP[0][1]=infinity.

Then the total number of non-overlapping strings is DP[len(string)][7]

This approach will not necessarily give the correct answer if the pattern to be matched contains repeated characters (e.g. 'strings').

Community
  • 1
  • 1
Peter de Rivaz
  • 33,126
  • 4
  • 46
  • 75
  • I think for non overlapping there is confusion in solution part condition in line 2 and 3 are same ? – io10 Jan 26 '14 at 18:55
  • @ Peter de Rivaz should we not increase DP[a][b+1] by 1 in 3 line ? or subtract -1 from DP[a-1][b-1] in line 2? – io10 Jan 26 '14 at 19:07