1

I need to enclose every single word that is at least 2 chars in length, in a string between a span tag. All question marks, punctuation, etc, should be left outside the spans (they must only hold a-z and also special characters as ñ,á,é, etc).

So, this:

Prenda de vestir que se ajusta? A la cintura y llega generalmente hasta el pie.

Should be this:

<a href=http://example.com/prenda>Prenda</a> <a href=http://example.com/de>de</a> <a href=http://example.com/vestir>vestir</a> <a href=http://example.com/que>que</a> 
<a href=http://example.com/se>se</a> <a href=http://example.com/ajusta>ajusta</a>? A <a href=http://example.com/la>la</a> 
<a href=http://example.com/cintura>cintura</a> y <a href=http://example.com/llega>llega</a> 
<a href=http://example.com/generalmente>generalmente</a> <a href=http://example.com/hasta>hasta</a> <a href=http://example.com/el>el</a> <a href=http://example.com/pie>pie</a>.

Any ideas? Thanks!

Andres SK
  • 10,779
  • 25
  • 90
  • 152

3 Answers3

2

Use this:

$result = preg_replace('/\b[\p{L}\p{M}]{2,}\b/u', '<a href=http://example.com/$0>$0</a>', $subject);

All letters, all accents.

Why:

"
\b              # Assert position at a word boundary
[\p{L}\p{M}]    # Match a single character present in the list below
                # A character with the Unicode property “letter” (any kind of letter from any language)
                # A character with the Unicode property “mark” (a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.))
   {2,}         # Between 2 and unlimited times, as many times as possible, giving back as needed (greedy)
\b              # Assert position at a word boundary
"

Edit:

$result = preg_replace_callback(
        '/\b[\p{L}\p{M}]{2,}\b/u',
        create_function(
            '$matches',
            'return <a href=http://example.com/strtolower($matches[0])>$matches[0]</a>;'
        ),
        $subject
);
FailedDev
  • 26,680
  • 9
  • 53
  • 73
1

Use this instead:

\b(\w{2,})\b

Basically, the \b means a "word delimiter" (matched the beginning and end of a word, excluding punctuation). \w is a word character, but can probably be substituted with [a-zA-Z] instead to exclude [0-9_] characters. Then you apply the quantifier {2,} meaning 2+ characters in length.

The replacer?

<a href="http://example.com/$1">$1</a>

And the always-appreciated example. (An an example converting to anchor tags instead.)

Brad Christie
  • 100,477
  • 16
  • 156
  • 200
0

Here's an example:

<?
$without = "Prenda de vestir que se ajusta? A la cintura y llega generalmente hasta el pie.";
$with = preg_replace("/([A-Za-z]{2,})/", "<a href=\"http://example.com/\\1\">\\1</a>", $without);
print $with;
?>
favoretti
  • 29,299
  • 4
  • 48
  • 61