-1

I'm a newbie. I've been learning about algorithms for two months now. I'm generally doing okay but I'm not good at understanding search algorithms nor implementing them. I'm particularly stuck on this pattern search algorithm, Reverse Factor. I've been researching it for a week now but I still don't completely understand it, let alone implement it. I don't have anyone I could ask but I don't want to skip any algorithms. So far I've found this algorithm. But I don't understand it well. I'm also not a native speaker. Can you help me?

the purpose is "search a pattern p in string t".

Algorithm RF /* reverse factor string matching */
    /* denote t[i + j.. i + m] by x;
         it is the last-scanned part of the text */

    i:= 0; 
    while i _< n - m do
    { 
        j:= m; 
        while j > 1 and x ϵ FACT(p) 
            do j:=j- 1;
        /* in fact, we check the equivalent condition x^R ϵ FACT(p^R) */
        if x = p then 
            report a match at position i;
        shift := RF shift[x];
        i := i + shift;
    }
end.

Fact(p) is the set of all factors (substrings) of p.

Thank you in advance.

Ellie Doe
  • 3
  • 2
  • "Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, [describe the problem](https://stackoverflow.com/help/how-to-ask) and what has been done so far to solve it." – Yury Tarabanko Apr 13 '18 at 15:44
  • 2
    I don't think OP asked for an off-site resource though, rather help understanding and implementing the algorithm. – Aaron Apr 13 '18 at 15:56
  • No, they're right. I edited the algorithm in, they had already written the comment before then. I'm sorry for not clarifying it. – Ellie Doe Apr 13 '18 at 16:01
  • What don't you understand about it? I assume that you've already performed your "due diligence" -- researched explanations on line, walked through the code with pencil & paper, and/or coded the algorithm, inserted `print` statements, and traced the values. Where are you stuck? – Prune Apr 13 '18 at 16:40
  • We're missing some definitions: `xEFACT`, `FACT`, and a reference to the purpose of this function. – Prune Apr 13 '18 at 16:44
  • 1
    I did research explanations. But I couldn't find a lot of material nor codes. Everything I found said the exact same thing and didn't go into depth. I found this algorithm in an article on pattern search algorithms. What I understand so far is that we use a suffix tree of the reverse pattern for searching. And the algorithm is similar to Boyer-Moore. – Ellie Doe Apr 13 '18 at 16:54
  • 1
    @Prune: the purpose is "search a pattern p in string t". `xEFact(p)` means `x elementOf Fact(p)` and `Fact(p) is the set of all factors (substrings) of p. An important part of this algorithms is how the shift array is constructed. – CoronA Apr 13 '18 at 16:57
  • @EllieDoe: thanks for the update. Please edit that into your question. – Prune Apr 13 '18 at 17:20
  • @CoronA: Please, either you or Ellie edit your expansions into the question. This will make it more useful for the archives. – Prune Apr 13 '18 at 17:21
  • @EllieDoe: Better remove the excusions and put the purpose at the algorithm to the front. The definitions of x € Fact was pretty fine after the first edit, the definition of fact is probably helpful, the hint on shift is obvious. – CoronA Apr 13 '18 at 17:30
  • Sorry I meant that you do not have to point out that the construction of shift is important. It is missing, but an interested reader could check it [here](http://www-igm.univ-mlv.fr/~lecroq/string/node23.html) – CoronA Apr 13 '18 at 17:36

1 Answers1

1

I will make a try:

i:= 0; 
while i _< n - m do //start at character 0
{ 
    j:= m; //start at character i + m (the potentially last character)
    whilej > 1 and x ϵ FACT(p)
        do j:=j- 1; //step back as long as t[i+j,i+m] is a substring of the pattern p
    /* in fact, we check the equivalent condition x^R ϵ FACT(p^R) */
    if x = p then // x=[i+0, i+m] == p
        report a match at position i; 
    shift := RF shift[x]; // look up the number of chars to advance
    i := i + shift; // advance
}

The construction of the array shift is quite hard. I cannot remember how this is done. However I could say what one would find at shift[x].

shift[x] = the number of save character shifts such that the next search does not miss a match.

Example: Having a string abcabcdab and a pattern bcd (| is i+m, * is i+j):

abc*|abcdab // start with i=0,j=3
ab*c|abcdab // c is a factor => continue
a*bc|abcdab // bc is a factor => continue
*abc|abcdab // abc is not a factor => shift = shift[bc] = 1
abca*|bcdab 
abc*a|bcdab // a is not a factor => shift = shift[] = 3
abcabcd*|ab 
abcabc*d|ab // d is a factor => continue
abcab*cd|ab // cd is a factor => continue
abca*bcd|ab // bcd is a factor and j = 0 => report match

See here for an example for debugging in Java. It is not as simple as your pseudocode, but you may debug it for better understanding.

CoronA
  • 7,717
  • 2
  • 26
  • 53
  • Thank you for your explanation. It is really simple and I understand the algorithm a lot better now. I'm reading the example code. The way I understand it is that most of the code is for the suffix tree. I didn't realize it had such a big part in this. Is it all necessary for the algorithm to work? – Ellie Doe Apr 13 '18 at 18:01
  • I think the suffix tree is necessary for determining efficiently whether a string is a factor of p and to compute the shift array. In Fact there are algorithms that are simpler than reverse factor. – CoronA Apr 13 '18 at 18:19
  • I mean, 384 lines? Why? Though the code looks like art and I think I'm understanding it, slowly. Thank you, again. – Ellie Doe Apr 13 '18 at 18:48
  • I do not know if this could be simplified, the [original code](http://www-igm.univ-mlv.fr/~lecroq/string/node23.html) is slightly shorter. – CoronA Apr 13 '18 at 19:38