I think you're going about this backwards. Instead of trying to define a regular expression that is not a word - define what is a word, and capture all character sequences that match that.
$special_words = array("dr.", "a.sh.", "a.k");
array_walk($special_words, function(&$item, $key){ $item= preg_quote($item, '~');});
$regex = '~(?<!\w)(' . implode('|', $special_words) . '|\w+)(?!\w)~';
$str = 'word word, dr. word: a.sh. word a.k word?!..';
preg_match_all($regex, $str, $matches);
var_dump($matches[0]);
The keys here are an array of special words, the array_walk, and the regular expression.
array_walk
This line, right after your array definition, walks through each of your special words and escapes all of the REGEX special characters (like .
and ?
), including the delimiter we're going to use later. That way, you can define whatever words you like and you don't have to worry about how it will affect the regular expression.
Regular Expression.
The Regex is actually pretty simple. Implode the special words using a |
as glue, then add another pipe and your standard word definition (I chose w+
because it makes the most sense to me.) Surround that giant alternation with parentheses to group it, and I added a lookbehind and a lookahead to ensure we weren't stealing from the middle of a word. Because regex works left to right, the a
in a.sh.
won't be split off into its own word, because the a.sh.
special word will capture it. Unless it says a.sh.e
, in which case, each part of the three part expression will match as three separate words.
Check it out.