9

The title may sound odd, but im kind of trying to set up this preg_replace that takes care of messy writers for a textarea. It has to:

  1. if there is an exclamation sign, there should not be another one in a row.
  2. if there is a ., the comma wins and it has to be ,
  3. when there is one+ spaces before a coma, it should be reduced to nothing.
  4. the sentence cannot start or end with a comma.
  5. there should never be more than 2 of the same letters joined together.
  6. a space must be always present after a comma.

E.g.:

  • ,My house, which is green., is nice!
  • My house..., which is green, is nice!!!
  • My house ,which is green,,, is nice!!

The end result should always be:

My house, which is green, is nice!

Is there an already built regex that takes care of this?

Solution check out FakeRainBrigand's solution below!

Community
  • 1
  • 1
Andres SK
  • 10,779
  • 25
  • 90
  • 152

2 Answers2

8

I might have to use this for my own sites... nice idea!

<?php

$text = 'My hooouse..., which is greeeeeen , is nice!!!  ,And pretty too...';

$pats = array(
'/([.!?]\s{2}),/', # Abc.  ,Def
'/\.+(,)/',  # ......,
'/(!)!+/',   # abc!!!!!!!!
'/\s+(,)/',  # abc   , def
'/([a-zA-Z])\1\1/', # greeeeeeen
'/,(?!\s)/'); 

$fixed = preg_replace($pats, '$1', $text);

echo $fixed;
echo "\n\n";

?>

And the 'modified' version of $text: "My house, which is green, is nice! And pretty too."

UPDATE: Here's the version that handles "abc,def" -> "abc, def".

<?php

$text = 'My hooouse..., which is greeeeeen ,is nice!!!  ,And pretty too...';

$pats = array(
'/([.!?]\s{2}),/', # Abc.  ,Def
'/\.+(,)/',        # ......,
'/(!)!+/',         # abc!!!!!!!!
'/\s+(,)/',        # abc   , def
'/([a-zA-Z])\1\1/');      # greeeeeeen

$fixed = preg_replace($pats, '$1', $text);
$really_fixed = preg_replace('/,(?!\s)/', ', ', $fixed);

echo $really_fixed;
echo "\n\n";
?>

I would think this is a bit slower since it's an additional function call.

Brigand
  • 84,529
  • 20
  • 165
  • 173
  • I should note that the last pattern on there doesn't really work... you'd have to call a separate preg_replace for that one because the replacement of ```'$1'``` doesn't work. If you think it's worth it, I can make the change. – Brigand Dec 05 '11 at 21:36
  • 2
    Facebook needs this, as does almost every other site out there. – Bojangles Dec 05 '11 at 21:38
  • nice! what do you mean with "the last pattern on there doesn't really work"? can you make the change to know what you mean? – Andres SK Dec 05 '11 at 21:42
  • @andufo Note that this will remove 333333 as well. It will also remove comma without space, rather than adding a space after it etc. – FailedDev Dec 05 '11 at 21:49
  • @JamWaffles not only facebook, also Disqus and WP. – Andres SK Dec 05 '11 at 21:50
  • @FailedDev I see, also "My house ,which" becomes "My housewhich" instead of "My house, which" ;) – Andres SK Dec 05 '11 at 21:51
  • I fixed both of the problems. The mistake was using ```.``` which means any character, when it should only be letters. @andufo, by "last pattern" I was referring to the last element in the $pats array. – Brigand Dec 05 '11 at 21:55
  • is this portable to javascript? i tried using the .replace(regex,string) way but it doesn't work exactly like preg_replace() – Andres SK Dec 07 '11 at 05:43
  • 1
    @andufo, it needs some modifications to work in JS. Here's a [fiddle](http://jsfiddle.net/tvJUg/). – Brigand Dec 07 '11 at 13:00
  • @FakeRainBrigand you are the regex master. My resolution for january will be to master this art as well. Thanks once again! – Andres SK Dec 07 '11 at 15:00
  • @FakeRainBrigand hi again, the /\s+/g is also deleting new lines. I tried an approach that just takes care of extra new lines, didn't work: $('#translate_q').val($('#translate_q').val().replace(/\v+/g,'')); – Andres SK Dec 12 '11 at 18:18
  • Replace the ```\s``` with ```[ ]``` for just spaces or ```[ \t]``` for spaces and tabs. You don't technically need the square brackets for a space, but it makes it quite a bit easier to read. – Brigand Dec 12 '11 at 19:55
2
 - $result = preg_replace('/!+/', '!', $subject);
 - $result = preg_replace('/\.*,/', ',', $subject);
 - $result = preg_replace('/\s+(?=,)/', '', $subject);
 - $result = preg_replace('/^,*|,*$/', '', $subject);
 - $result = preg_replace('/([a-z])\1+/i', '$1$1', $subject);
 - $result = preg_replace('/,(?!\s)/', ', ', $subject);

One by one matching to your rules :)

FailedDev
  • 26,680
  • 9
  • 53
  • 73