1

i have a string that random generate by a special characters (B,C,D,F,X,Z),for example to generate a following string list:

B D Z Z Z C D C Z
B D C
B Z Z Z D X 
D B Z F
Z B D C C Z
B D C F Z
..........

i also have a pattern list, that is to match the generate string and return a best pattern and extract some string from the string.

string pattern

B D C [D must appear before the C >> DC]
B C F
B D C F
B X [if string have X,must be matched.]
.......

for example,

B D Z Z Z C D C Z,that have B and DC,so that can match by B D C

D B Z C F,that have B and C and F,so that can match by B C F

D B Z D F,that have B and F,so that can match by B F

.......

now,i just think about suffix array.

1.first convert a string to suffix array object.

2.loop each a pattern,that find which suffix array can be matched.

3.compare all matched patterns and get which is a best pattern.

var suffix_array=Convert a string to suffix array.
var list=new List();
for (int i=0;i<pattern length;i++){
    if (suffix_array.match(pattern))
        list.Add(pattern);
}
var max=list[0];
for (int i=1;i<list.length;i++){
{
   if (list[i]>max)
      max=list[i];
      Write(list[i]);
}

i just think this method is to complex,that need to build a tree for a pattern ,and take it to match suffix array.who have a more idea?

====================update

i get a best solution now,i create a new class,that have a B,C,D,X...'s property that is array type.each property save a position that appear at the string. now,if the B not appear at the string,we can immediately end this processing. we can also get all the C and D position,and then compare it whether can sequential appear(DC,DCC,CCC....)

zhengchun
  • 1,261
  • 13
  • 19
  • Do you interested in all possible patterns? If you just want best pattern for match determine your definition of your best pattern, e.g B D B C F can match both BDC and BCF. – Saeed Amiri Feb 22 '12 at 10:18
  • If I understand this correctly, you expect the pattern "B F" to match a string such as "C B A F", because "B" and "F" appear in it in the order "B first, then F", even though their occurrences are not ajacent. Is my understanding correct? (If it is, that will make it rather non-trivial to figure out a suffix-array based algorithm to solve this.) – jogojapan Feb 22 '12 at 10:48
  • I think a key question to be answered first is: Should you index the string and then apply the patterns to it one by one (as you do now), or should you "index" the patterns (into some kind of optimised dictionary, e.g. a search trie) and apply that to each position of the text? Or even both. To answer that it would be good to know about how many patterns there are, how long/short/specific/unspecific they typically are, and how long the text is likely to be. – jogojapan Feb 23 '12 at 23:58
  • @jogojapan,yes,you are saied correct.i'm doing thing by this method.i save a character and with its position into the common array ,create a separate C index array.i also creates a rule list based on tree structure for C (CD,CC,CCD....).and combine if....else statement can solved my problem. – zhengchun Feb 24 '12 at 07:41

2 Answers2

0

I'm not sure what programming language you are using; have you checked its capabilities with regular expressions ? If you are not familiar with these, you should be, hit Google.

High Performance Mark
  • 77,191
  • 7
  • 105
  • 161
0
var suffix_array=Convert a string to suffix array.
var best=(worst value - presumably zero - pattern);
for (int i=0;i<pattern list array length;i++){
  if (suffix_array.match(pattern[i])){
    if(pattern[i]>best){
      best=pattern[i];
    }
    (add pattern[i] to list here if you still want a list of all matches)
  }
}
write best;

Roughly, anyway, if I understand what you're looking for that's a slight improvement though I'm sure there may be a better solution.

Beeblbrox
  • 351
  • 1
  • 11