4

I want to implement a generalization of the Boyer Moore Horspool algorithm that takes care of wildcards (match any letter in the word). This means that the pattern h _ _ s e would be found two times in this text: horsehouse.

I need help to implement this, I cant get deep enough understanding of the algorithm to figure this out by myself, some tips ?

int [] createBadCharacterTable(char [] needle) {

        int [] badShift = new int [256];

        for(int i = 0; i < 256; i++) {
            badShift[i] = needle.length;
        }

        int last = needle.length - 1;

        for(int i = 0; i < last; i++) {
            badShift[(int) needle[i]] = last - i;
        }

        return badShift;
    }

    int boyerMooreHorsepool(String word, String text) {

        char [] needle = word.toCharArray();
        char [] haystack = text.toCharArray();

        if(needle.length > haystack.length) {
            return -1;
        }

        int [] badShift = createBadCharacterTable(needle);

        int offset = 0;
        int scan = 0;
        int last = needle.length - 1;   
        int maxoffset = haystack.length - needle.length;

        while(offset <= maxoffset) {
            for(scan = last; (needle[scan] == haystack[scan+offset] ||  needle[scan] == (int) '_'); scan--) {

                if(scan == 0) { //Match found
                    return offset;
                }
            }
            offset += badShift[(int) haystack[offset + last]];
        }
        return -1;
    }
user265767
  • 559
  • 3
  • 12
  • 27
  • Do you have a specific question? – Collin Nov 21 '12 at 13:43
  • Edit: Added solution to use _ (underscore) as wildcard. I just added needle[scan] == (int) '_') as a test in for-loop. – user265767 Nov 21 '12 at 18:21
  • Is this an assignment? I spotted someone else posting just this question again in Nov 2013! (Before deleting it) If not, then you don't need to open-code it - so just use `Pattern.compile(Pattern.quote(needle).replaceAll("_", "\\\\E.\\\\Q"));` – Luke Usherwood Nov 23 '13 at 08:24

0 Answers0