0

I implemented this "bad word" check function in php:

# bad word detector
function check_badwords($string) {
    $badwords = array(a number of words some may find inappropriate for SE);
    foreach($badwords as $item) {
        if(stripos($string, $item) !== false) return true;
    }
    return false;
}

It works alright, except I'm having a little problem. If the $string is:

Who is the best guitarist ever?

...it returns true, because there is a match with Who ($string) and ho (in $badwords array). How could the function be modified so that it only checks for complete words, and not just part of words?

  • check_badwords('She is a ho'); //should return true
  • check_badwords('Who is she?'); //should return false

Thanks!

deg
  • 467
  • 3
  • 8
Andres SK
  • 10,779
  • 25
  • 90
  • 152
  • 1
    https://stackoverflow.com/questions/26177013/filter-exact-word-not-wholeword this should help. – deg Sep 05 '17 at 21:39
  • [How to find full words only in string](https://stackoverflow.com/questions/8898825/how-to-find-full-words-only-in-string) as another duplicate. – spectras Sep 05 '17 at 21:44
  • Your array of words isn't sufficiently exhaustive. Without even scrolling I can already tell that you're missing a whole class of compound words beginning with anus. Better to just split the string on space and check each word to see if it has an entry in urban dictionary. – Don't Panic Sep 05 '17 at 22:09
  • Without sounding like a tw*t: Is it really necessary to post a big list of profanity / swear words on SO to get a programming question across? The problem could still be solved if the array contained `unicorn`, `donut` and `icecream`. – ccKep Sep 05 '17 at 22:15
  • 1
    [Clbuttic rookie mistake.](https://en.wikipedia.org/wiki/Scunthorpe_problem) – Sammitch Sep 05 '17 at 22:24

3 Answers3

1

You probably would like to replace stripos with preg_match

if you can make it a better regex, more power to you:

preg_match("/\s($string){1}\s/", $input_line, $output_array);
ren.rocks
  • 772
  • 1
  • 7
  • 22
1

In order to check for complete words you should use regular expressions:

function check_badwords($string)
{
    $badwords = array(/* the big list of words here */);
    // Create the regex
    $re = '/\b('.implode('|', $badwords).')\b/';
    // Check if it matches the sentence
    return preg_match($re, $string);
}

How the regex works

The regular expression starts and ends with the special sequence \b that matches a word boundary (i.e. when a word character is followed by a non-word character or viceversa; the word characters are the letters, the digits and the underscore).

Between the two word boundaries there is a subpattern that contains all the bad words separated by |. The subpattern matches any of the bad words.

If you want to know what bad word was found you can change the function:

function check_badwords($string)
{
    $badwords = array(/* the big list of words here */);
    $re = '/\b('.implode('|', $badwords).')\b/';
    // Check for matches, save the first match in $match
    $result = preg_match($re, $string, $match);
    // if $result is TRUE then $match[1] contains the first bad word found in $string
   return $result;
}
axiac
  • 68,258
  • 9
  • 99
  • 134
  • 1
    Could we just use placeholder for those words instead? I can't think of a logical reason to have profanity / swearing anywhere on SO, even if this is a valid question in and off itself. – ccKep Sep 05 '17 at 22:11
0

You can even lowercase the $string and then instead using stripos or even a regular expression, just use in_array(). That'd match against the whole word.

MarkSkayff
  • 1,334
  • 9
  • 14