1

I'm trying to implement Boyer-Moore Algorithm in C for searching a particular word in .pcap file. I have referenced code from http://ideone.com/FhJok5. I'm using this code as it is.

Just I'm passing packet as string and the keyword I'm searching for to the function search() in it. When I'm running my code it is giving different values every time. Some times its giving correct value too. But most of times its not identifying some values.

I have obtained results from Naive Algo Implementation. Results are always perfect.

I am using Ubuntu 12.0.4 over VMware 10.0.1. lang: C

My question is It has to give the same result every time right? whether right or wrong. This output keeps on changing every time i run the file on same inputs; and during several runs, it gives correct answer too. Mostly the value is varying between 3 or 4 values.

For Debugging I did so far:

  1. passed strings in stead of packet every time, Its working perfect and same and correct value every time.
  2. checking pcap part, I can see all packets are being passed to the function (I checked by printing packet frame no).
  3. same packets I am sending to Naive Algo code, its giving perfect code.

Please give me some idea, what can be the issue. I suspect some thing wrong with memory management. but how to find which one?

Thanks in advance.

# include <limits.h>
# include <string.h>
# include <stdio.h>

# define NO_OF_CHARS 256

// A utility function to get maximum of two integers
int max (int a, int b) { return (a > b)? a: b; }

// The preprocessing function for Boyer Moore's bad character heuristic
void badCharHeuristic( char *str, int size, int badchar[NO_OF_CHARS])
{
    int i;

    // Initialize all occurrences as -1
    for (i = 0; i < NO_OF_CHARS; i++)
         badchar[i] = -1;

    // Fill the actual value of last occurrence of a character
    for (i = 0; i < size; i++)
         badchar[(int) str[i]] = i;
}

/* A pattern searching function that uses Bad Character Heuristic of
   Boyer Moore Algorithm */
void search( char *txt,  char *pat)
{
    int m = strlen(pat);
    int n = strlen(txt);

    int badchar[NO_OF_CHARS];

    /* Fill the bad character array by calling the preprocessing
       function badCharHeuristic() for given pattern */
    badCharHeuristic(pat, m, badchar);

    int s = 0;  // s is shift of the pattern with respect to text
    while(s <= (n - m))
    {
        int j = m-1;

        /* Keep reducing index j of pattern while characters of
           pattern and text are matching at this shift s */
        while(j >= 0 && pat[j] == txt[s+j])
            j--;

        /* If the pattern is present at current shift, then index j
           will become -1 after the above loop */
        if (j < 0)
        {
            printf("\n pattern occurs at shift = %d", s);

            /* Shift the pattern so that the next character in text
               aligns with the last occurrence of it in pattern.
               The condition s+m < n is necessary for the case when
               pattern occurs at the end of text */
            s += (s+m < n)? m-badchar[txt[s+m]] : 1;

        }

        else
            /* Shift the pattern so that the bad character in text
               aligns with the last occurrence of it in pattern. The
               max function is used to make sure that we get a positive
               shift. We may get a negative shift if the last occurrence
               of bad character in pattern is on the right side of the
               current character. */
            s += max(1, j - badchar[txt[s+j]]);
    }
}

/* Driver program to test above function */
int main()
{
    char txt[] = "ABAAAABAACD";
    char pat[] = "AA";
    search(txt, pat);
    return 0;
Axel Kemper
  • 10,544
  • 2
  • 31
  • 54
  • is it getting the exact same input every time? if not, then it isn't surprising it gets different answers – Marshall Tigerus May 07 '14 at 16:39
  • Make sure you're compiling at the maximum warning level to detect uninitialized variables, type mismatches, etc. – nobody May 07 '14 at 17:09
  • A potential issue that I can see is that you are using (probably signed) char values as an index. I get warnings for two ocurrences, but there really are three, because the cast to `(int)` when assigning to `badchar` is useless in terms of signedness. You could cast to `(unsigned char)` or `(uint8_t)` from `` or you could use pointers to unsigned char internally throughout. This is only an issue if your text and pattern are not 7-bit clean, i.e. contain non-ASCII characters. – M Oehm May 07 '14 at 17:17
  • @MarshallTigerus: Yes, I am giving the same input every time. Still its giving different values. I found some times, it is not avoiding some of them. – D V Santhosh Kiran May 08 '14 at 13:16
  • @AndrewMedico: I am compiling it using gcc. Can you please explain me how to compile at maximum warning level. – D V Santhosh Kiran May 08 '14 at 13:18
  • The `-W...` options activate individual warnings and `-Wall` enables most of them, but not all, despite the name. There's also `-Wextra` for even stricter warnings. – M Oehm May 08 '14 at 14:05

0 Answers0