0

I've a code,

$text = "This is a $1ut ( Y ) @ss @sshole a$$ ass test with grass and passages.";
$blacklist = array(
  '$1ut',
  '( Y )',
  '@ss',
  '@sshole',
  'a$$',
  'ass'
);
foreach ($blacklist as $word) {
  $pattern = "/\b". preg_quote($word) ."\b/i";
  $replace = str_repeat('*', strlen($word));
  $text = preg_replace($pattern, $replace, $text);
}
print_r($text);

which return the following result:

This is a $1ut ( Y ) @ss @sshole a$$ *** test with grass and passages.

When I remove word boundary from regexp,

$pattern = "/". preg_quote($word) ."/i";

it return:

This is a **** ***** *** ***hole *** *** test with gr*** and p***ages.

How can I write the regexp so it wouldn't replace such words as passages, grass etc. but completely replace such as @sshole?

Vlad Stratulat
  • 1,296
  • 1
  • 10
  • 24
  • 5
    It should be added that no matter how much you think a wordlist with cusswords and replacementw will help, people will **always** try to find away around it, and eventually they will. Then it'd just be assh0le or a$$hole instead. – h2ooooooo Sep 26 '12 at 13:05
  • Yes I know that. Because of that the list words will grow. But my question is how to write the regex. This could be used not only to prevent swear words but in any similar situations. – Vlad Stratulat Sep 26 '12 at 13:08
  • 1
    I think the more important question is why with the \b's is it not finding your words – BugFinder Sep 26 '12 at 13:12
  • 2
    Most regex systems seem not to be able to handle `\b@ss\b` but can easily handle `\bass\b`. Odd. ***Edit*** Apparently `\b` only supports ASCII: http://stackoverflow.com/questions/2881445/utf-8-word-boundary-regex-in-javascript – h2ooooooo Sep 26 '12 at 13:13

1 Answers1

3

According to this \b does not support anything other than [A-Za-z0-9_].

Note that you have to escape your Regex, as you're generating it from a string (and the PHP compiler, at the time it creates this string, doesn't know it's a Regex).

Using the Regex /(^|\s)WORD($|\s)/i seems to work.

Code example:

$text = "This is a $1ut ( Y ) @ss @sshole a$$ ass test with grass and passages.";
$blacklist = array(
  '$1ut',
  '( Y )',
  '@ss',
  '@sshole',
  'a$$',
  'ass'
);
foreach ($blacklist as $word) {
  $pattern = "/(^|\\s)" . preg_quote($word) . "($|\\s)/i";
  $replace = " " . str_repeat('*', strlen($word)) . " ";
  $text = preg_replace($pattern, $replace, $text);
}
echo $text;

Output:

This is a **** ***** *** ******* *** *** test with grass and passages.

Be aware that if your string starts or ends with one of these words, we'll add a space to the match in each end, meaning that there'll be a space before or after the text. You can take care of this with trim()

Update;

Also be aware that this doesn't account for punctuation in any way.

the other user has an ass. and it is nice would go through for example.

To conquer this, you could extend it even further:

/(^|\\s|!|,|\.|;|:|\-|_|\?)WORD($|\\s|!|,|\.|;|:|\-|_|\?)/i

This would mean that you also had to change the way we're replacing:

$text = "This is a $1ut ( Y ) @ss?@sshole you're an ass. a$$ ass test with grass and passages.";
$blacklist = array(
  '$1ut',
  '( Y )',
  '@ss',
  '@sshole',
  'a$$',
  'ass'
);
foreach ($blacklist as $word) {
  $pattern = "/(^|\\s|!|,|\\.|;|:|\\-|_|\\?)" . preg_quote($word) . "($|\\s|!|,|\\.|;|:|\\-|_|\\?)/i";
  $replace = '$1' . str_repeat('*', strlen($word)) . '$2';
  $text = preg_replace($pattern, $replace, $text);
}
echo $text;

and add all the other punctuation etc.

Output:

This is a **** ***** ***?******* you're an ***. *** *** test with grass and passages.

Community
  • 1
  • 1
h2ooooooo
  • 39,111
  • 8
  • 68
  • 102