4

I need to replace textual emoticons to html image tags. I compiled the following data:

private $smile = array(">:]", ":-)", ":)", ":o)", ":]", ":3", ":c)", ":>", "=]", "8)", "=)", ":}", ":^)");
private $laugh = array(">:D", ":-D", ":D", "8-D", "x-D", "X-D", "=-D", "=D", "=-3", "8-)");
private $sad = array(">:[", ":-(", ":(",  ":-c", ":c", ":-<", ":-[", ":[", ":{", ">.>", "<.<", ">.<");
private $wink = array(">;]", ";-)", ";)", "*-)", "*)", ";-]", ";]", ";D", ";^)");
private $tongue = array(">:P", ":-P", ":P", "X-P", "x-p", ":-p", ":p", "=p", ":-Þ", ":Þ", ":-b", ":b", "=p", "=P");
private $surprise = array(">:o", ">:O", ":-O", ":O", "°o°", "°O°", ":O", "o_O", "o.O", "8-0");
private $annoyed = array(">:\\", ">:/", ":-/", ":-.", ":\\", "=/", "=\\", ":S");
private $cry = array(":'(", ";'(");

private $t_smile = "<img class=\"smiley\" src=\"/images/emoticons/smile.png\"/>";
private $t_laugh = "<img class=\"smiley\" src=\"/images/emoticons/laugh.png\"/>";
private $t_sad = "<img class=\"smiley\" src=\"/images/emoticons/sad.png\"/>";
private $t_wink = "<img class=\"smiley\" src=\"/images/emoticons/wink.png\"/>";
private $t_tongue = "<img class=\"smiley\" src=\"/images/emoticons/tongue.png\"/>";
private $t_surprise = "<img class=\"smiley\" src=\"/images/emoticons/surprise.png\"/>";
private $t_annoyed = "<img class=\"smiley\" src=\"/images/emoticons/annoyed.png\"/>";
private $t_cry = "<img class=\"smiley\" src=\"/images/emoticons/cry.png\"/>"

I am currently simply doing for example:

$str = str_replace($this->laugh, $this->t_laugh, $str);

for each group. It works fine but I need the replacement to occur only if the words are not surrounded by letters or other digits. In other words, I need to compile a regex which contain each emoticon array so that I can use preg_replace instead of str_replace. Is there a way I can do this easily instead of hardcoding the regex and escaping all the necessary characters?

EDIT:

Also, I need to match and replace the emoticons which appear in the beginning and end of a string, so a simple padding with a space technique won't suffice.

EDIT 2:

I followed Mark's example and pre-compiled the regex from the arrays using preg_quote as:

private $smile = "#(^|\W)(\>\:\]|\:-\)|\:\)|\:o\)|\:\]|\:3|\:c\)|\:\>|\=\]|8\)|\=\)|\:\}|\:\^\))($|\W)#";
private $laugh = "#(^|\W)(\>\:D|\:-D|\:D|8-D|x-D|X-D|\=-D|\=D|\=-3|8-\)|xD|XD|8D|\=3)($|\W)#";
private $sad = "#(^|\W)(\>\:\[|\:-\(|\:\(|\:-c|\:c|\:-\<|\:-\[|\:\[|\:\{|\>\.\>|\<\.\<|\>\.\<)($|\W)#";
private $wink = "#(^|\W)(\>;\]|;-\)|;\)|\*-\)|\*\)|;-\]|;\]|;D|;\^\))($|\W)#";
private $tongue = "#(^|\W)(\>\:P|\:-P|\:P|X-P|x-p|\:-p|\:p|\=p|\:-Þ|\:Þ|\:-b|\:b|\=p|\=P|xp|XP|xP|Xp)($|\W)#";
private $surprise = "#(^|\W)(\>\:o|\>\:O|\:-O|\:O|°o°|°O°|\:O|o_O|o\.O|8-0)($|\W)#";
private $annoyed = "#(^|\W)(\>\:\\|\>\:/|\:-/|\:-\.|\:\\|\=/|\=\\|\:S|\:\/)($|\W)#";
private $cry = "#(^|\W)(\:'\(|;'\()($|\W)#";

Works perfectly with preg_replace!

dscer
  • 228
  • 4
  • 11

3 Answers3

5

If you want to use a regex:

$pat = '#(^|\W)'.preg_quote($this->laugh,'#').'($|\W)#';
$str = str_replace($pat, $this->t_laugh, $str);

This basically means the emoticon can be at the start of the string or proceded by a non-word character, and must be followed by the end of the string or another non-word character. preg_quote is necessary in case your emoticon contains any special regex characters.

Also, a better format might be:

$emoticons = array(
    'smile' => array('<img src...', array('>:]',':-)',...),
    'laugh' => array('<img src....', array(...)),
    ...
)

Then you can loop over everything.


Update

Should use negative lookarounds instead to match side-by-side emoticons. Then it won't try matching the surrounding spaces.

<?php
$smile = array(">:]", ":-)", ":)", ":o)", ":]", ":3", ":c)", ":>", "=]", "8)", "=)", ":}", ":^)");
$laugh = array(">:D", ":-D", ":D", "8-D", "x-D", "X-D", "=-D", "=D", "=-3", "8-)");
$sad = array(">:[", ":-(", ":(",  ":-c", ":c", ":-<", ":-[", ":[", ":{", ">.>", "<.<", ">.<");
$wink = array(">;]", ";-)", ";)", "*-)", "*)", ";-]", ";]", ";D", ";^)");
$tongue = array(">:P", ":-P", ":P", "X-P", "x-p", ":-p", ":p", "=p", ":-Ã", ":Ã", ":-b", ":b", "=p", "=P");
$surprise = array(">:o", ">:O", ":-O", ":O", "°o°", "°O°", ":O", "o_O", "o.O", "8-0");
$annoyed = array(">:\\", ">:/", ":-/", ":-.", ":\\", "=/", "=\\", ":S");
$cry = array(":'(", ";'(");

$ary = array_merge($smile, $laugh, $sad, $wink, $tongue,$surprise,$annoyed,$cry);

foreach ($ary as $a)
{
        $quoted[] = preg_quote($a, '#');
}

$regex = implode('|', $quoted);


$full = '#(?!<\w)(' . $regex .')(?!\w)#';
echo $full.PHP_EOL;
$str = "Testing :) emoticons :D :(";

preg_match_all($full, $str, $matches);
print_r($matches[0]);

Also, try to use single-quotes when writing regex patterns, because double-quotes allow escape sequences, and single quotes won't interpret escape sequence. i.e., you sometimes need to double your slashes when using double quotes.

mpen
  • 272,448
  • 266
  • 850
  • 1,236
  • Does `preg_quote` accept an array? If not, write a method that loops through the array, and returns the escaped array with the surrounding pattern. – Josh Jan 12 '12 at 03:57
  • 1
    @Josh: No, it doesn't. http://ca.php.net/manual/en/function.preg-quote.php. You can also put all the "smiles" into a single regex by imploding them with a pipe (after they're preg-quoted). Final string should look like: `#(^|\W)(:-\)|:\)|>:\])($|\W)#` – mpen Jan 12 '12 at 04:00
  • True, then it will just logically OR through each pattern. Nice solution. I would add that to your answer. – Josh Jan 12 '12 at 04:05
  • Thank you for your answer, I am trying to use this regex to detect emoticons in strings but it seems that it is not matching emoticons at the end of the string. [link](http://ideone.com/MwfIA) Here is a code example of the solution you provided and the issue in question: [link]http://ideone.com/MwfIA – dscer Jan 12 '12 at 06:43
  • @dscer: Err.... I'm not quite sure what's wrong with it. It works if there isn't an icon immediately preceding it: http://ideone.com/Vl8Kf – mpen Jan 12 '12 at 17:19
  • @Mark: It's strange... It's not working in this regex even though all the emoticons are specified: http://ideone.com/clone/ETGAO – dscer Jan 12 '12 at 17:24
  • @dscer: If you throw an "x" right after that D in :D it finds the last one: http://ideone.com/fvBoH I'm wondering if preg_match_all just doesn't like si... no, I just figured it out. Both the :D and the :( match the space between them, and matches aren't allowed to overlap. I think we can find this with negative lookarounds... – mpen Jan 12 '12 at 19:29
0

Maybe have a formatting loop like

for($i=0;$i<count($smiles);++$i){
   $smiles[$i]="~\s".$smiles[$i]."\s~";
}

then it's just a drop in to preg_replace($smiles,$t_smiles,$text)

atxdba
  • 5,158
  • 5
  • 24
  • 30
0

Something along these lines is probably what you're looking for:

function toRegex(array $emotes) {
    foreach ($emotes as &$emote)
        $emote = preg_quote($emote, "/");
    return "/\b" . implode($emotes, "\b|\b") . "\b/";
}

$imaged = preg_replace(toRegex($smiles), $t_smiles);

Also, as mark mentioned, you'd be better of with a giant array of all emoticons than a hundred little variables you have to manually deal with.

sirbrialliance
  • 3,612
  • 1
  • 25
  • 15
  • word bounds don't work here. i believe there has to be a word character on one side, and a non-word character on the other. the `:`s and `)`s in smilies aren't word characters, so `\b` wouldn't match anything in a string like `" :) "` for example (spaces on either side). actually, it *would* match `x:)` which is the exact opposite of what we want. – mpen Jan 13 '12 at 01:55