-1

I have a bit of trouble implementing the BMH algorithm in C++.

Here's the code:

#define Nm 2000005
int D[256];
char To[Nm],P[Nm],*T;
int Tl,Pl;
int cont;
void initialize_Lenght()
{
    Tl=strlen(To);
    Pl=strlen(P);
    T=To;
}
void compute_D()
{
    for(int i=0;i<256;i++)
        D[i]=Pl;
    for(int i=0;i<Pl;i++)
        D[P[i]]=Pl-i;
}
void Boyer_Moore()
{
    int i;

    while ( Tl>=Pl )    
    {
        for(i=Pl-1;T[i]==P[i]&&i>=0;i--)
            if(i==0)
            {
                if(cont<1000)
                    v[cont]=(T-To); // I also have to store the first 1000 values 
                cont++;
            }
            Tl -= D[T[i+1]];
            T += D[T[i+1]];
    }
}

It works for most examples but there are some examples that do not work (only ones which I found thus far are huge tests downloaded from different sources).

I would want to know where / what I am doing wrong (I do not really want code).

Edit: Due to comments

Do you have any idea how I could make this algorithm run faster without implementing a full Boyer-Moore version of it?

greybeard
  • 2,249
  • 8
  • 30
  • 66
Taigi100
  • 1
  • 3
  • can you be a bit more specific about what input doesn't work, and in what way it doesn't work? otherwise we'll have to repeat all your effort! – Eevee Jan 16 '13 at 19:44
  • I'm not sure what does not work since the test file which doesn't is HUGE, but I am gonna upload it anyway. – Taigi100 Jan 16 '13 at 20:24
  • Yep I'm retardedly stupid... My problem was a stupid if... it had <1000 instead of <=1000... – Taigi100 Jan 16 '13 at 20:33
  • P.S. I still need the speed improvement. – Taigi100 Jan 16 '13 at 20:39
  • (String:n, Pattern:m) Complexity of BMH = O(n+m). You can do some pre-checks to avoid the complexity getting O(n^2) in case of some kind of Strings and Patterns. – sameerkn Jan 30 '17 at 08:14

1 Answers1

1

The order of tests in

for(i=Pl-1;T[i]==P[i]&&i>=0;i--)

is wrong. After a complete match, you compare T[-1] to P[-1] before checking whether the index is admissible.

If a mismatch occurs at the last pattern character,

Tl -= D[T[i+1]];
T += D[T[i+1]];

skips according to a character that need not exist (if the pattern end is aligned with the text end).

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
  • I have realised that and fixed it by adding an if in the for instead of doing the test in the for-test. – Taigi100 Jan 16 '13 at 20:43