0

This is a very strange situation, we have a list of bad words and all worked perfect until a customer by the name "Claudia" tried to submit the form:

$blocked = ['audi','opel','vw','mercedes','porsche'];
$input = 'Claudia';
$matched = preg_match_all("/(".implode('|', $blocked).")/i", $input);


if($matched > 0) {
    echo "Your word: {$input} is blocked";
} else {
    echo "Your word: {$input} is OK";
}

How to make this bad words checker to work also if you try to submit with input "Claudia"?

But NOT with:

"my audi"
"-audi-"
"**audi**"

... Or any other variations with audi?

lewis4u
  • 14,256
  • 18
  • 107
  • 148
  • what is you php version ? – Yassine CHABLI Mar 14 '18 at 11:45
  • php version is 5.64 – lewis4u Mar 14 '18 at 11:45
  • maybe I can white list some words? I don't know how!? I could maybe add another check if the word is in white list then let it through!? – lewis4u Mar 14 '18 at 11:46
  • Just hope a person called mercedes never submits the form. ;p – Lawrence Cherone Mar 14 '18 at 11:47
  • You can use **word boundaries** to block only full words [word boundaries](https://stackoverflow.com/questions/6531724/how-exactly-do-regular-expression-word-boundaries-work-in-php) – Adder Mar 14 '18 at 11:48
  • 2
    You've run into the Scunthorpe Problem (https://en.wikipedia.org/wiki/Scunthorpe_problem). You can add additional checks to work around it but ultimately you will always run into problems like this sooner or later if you try to automatically filter out input. Maybe it would be better to store the input anyway but flag it as suspect for a human moderator to review? – GordonM Mar 14 '18 at 11:56

1 Answers1

2

You can surround the regular expression with word boundary markers (\b), which will restrict it to matching "whole" words only. Change the line to:

$matched = preg_match_all("/\b(".implode('|', $blocked).")\b/i", $input);

See https://eval.in/971767

iainn
  • 16,826
  • 9
  • 33
  • 40