2

I have a string which may contain a pattern like:

LINK([anchor text],[link])

What I would like to do is transform this expression into a HTML link:

<a href="link">anchor text</a>

At the moment, I'm performing the replacement with the following PHP snippet:

$string = 'LINK(  some anchor text    ,   http://mydomain.com  )';
$search = '/LINK\s*\(\s*(.+),\s*([^\s]+)\s*\)/';
$replace = '<a href="$2">$1</a>';
preg_replace($search, $replace, $string);

The problem I'm facing are the spaces after the anchor text. Fortunately, in HTML multiple spaces are interpreted as a single space, but in this example I would however show a link with a (underlined) annoying space. Is there any way to trim this anchor text? I can't treat it as the "link" substring, since it may contain spaces.

Giorgio
  • 1,940
  • 5
  • 39
  • 64
  • 2
    Perhaps [preg_replace_callback()](http://www.php.net/manual/en/function.preg-replace-callback.php) with some code in the callback to handle trimming as well as the actual replace – Mark Baker Jan 07 '14 at 15:31
  • possible duplicate of [Regex trim or preg\_replace white space including tabs and new lines](http://stackoverflow.com/questions/9129368/regex-trim-or-preg-replace-white-space-including-tabs-and-new-lines) – Kumar V Jan 07 '14 at 15:32

3 Answers3

2

Assuming that the anchor text cannot contain commas or more than 1 space in a row, you could perhaps use:

LINK\s*\(\s*([^\s,]+(?:\s[^\s,]+)*)\s*,\s*(\S+)\s*\)

regex101 demo

Instead of .+, I'm using [^\s,]+(?:\s[^\s,]+)* which will match one word, and more words separated by space (where a word is a series of non-space characters with at least one character).

Also changed your negated class [^\s] which appears later on to \S.

Jerry
  • 70,495
  • 13
  • 100
  • 144
  • 1
    Nice job @Jerry! That's what I was looking for. I've slightly edited the pattern to `LINK\s*\(\s*([^\s]+(?:\s[^\s]+)*)\s*,\s*(\S+)\s*\)` because my intention was to allow commas too within anchor text. Thank you! – Giorgio Jan 07 '14 at 15:45
  • 1
    @Giorgio Okay, cool! Without those commas, you might be getting a little more overheads, but if that makes the regex more flexible! Note that since you removed the commas, you can actually use `\S` instead of `[^\s]` :) – Jerry Jan 07 '14 at 15:46
1

You could make the relevant quantifiers lazy, that they don't eat up the white-spaces before , or ):

'/LINK\(\s*(.+?)\s*,\s*([^\s]+?)\s*\)/'

by adding an ? after +.

Test

Jonny 5
  • 12,171
  • 2
  • 25
  • 42
1

What you can do in this case is change the first group to group lazily.

$search = '/LINK\s*\(\s*(.+),\s*([^\s]+)\s*\)/';

Can be changed to:

$search = '/LINK\s*\(\s*(.+?)\s*,\s*([^\s]+)\s*\)/';

Notice the question mark after the plus. This tells the program to match it using the least number of characters.

In this case, the laziest it can match is a string, followed by any number of spaces, then a comma.

In the original case, it would be matching greedily. This means that it will try to match the maximum number of characters possible, causing the .+ to match all characters up to the comma.

Here is a regex101 of the code.

Sean
  • 2,278
  • 1
  • 24
  • 45