1

I'm trying to create a regex that matches the five last "words" of the input, where a "word" is defined as anything that matches [^ ]+ or [^ ]*<[^>]*>[^ ]* (so anything separated by spaces, but counting spaces between < and > as letters)

I tried this:

/([^ ]+(?:(?<!<[^>]+) +(?![^<]*>)(?:.*?)){0,4})$/

but it gives me the error that lookbehind must be fixed lenght.

Say I have the following string:

'It\'s just that he <span class="verb">appear</span>ed rather late.'

it should match

'that he <span class="verb">appear</span>ed rather late.'
joelproko
  • 89
  • 7
  • Please add one or more example strings and the expected output. – Casimir et Hippolyte May 28 '15 at 13:43
  • One way would be `while (preg_match('/<[^> ]* /',$input)) $input = preg_replace('/(<[^> ]*) /','$1'."\0",$input);` `preg_match('/(?:(?:[^ ]+) ){0,4}[^ ]*$/',$input,$match);` `$input = str_replace("\0"," ");` `$match[0] = str_replace("\0"," ");` but that seems rather crude and might break things if the used character (\0 here) appears in the input already – joelproko May 28 '15 at 13:53
  • 1
    in case if you weren't aware - please take a look at the top answer here: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags :-) – Geo May 28 '15 at 14:08
  • Is pure regex needed here for some reason? Isn't it much easier to first use `strip_tags()` function and then conut "words" or even `explode()` string ? – D. Cichowski May 28 '15 at 14:12
  • Decent idea, SilentDariusz, but the tags need to stay. – joelproko May 29 '15 at 09:11

2 Answers2

1

I think your solution was already pretty close. Please see this one:

$str = 'It\'s just that he <span class="verb">appear</span>ed rather late.';
$reg = '/(([^ ]*<[^>]*>[^ ]*)+|[^ ]+)/'; // let me know if you need explanation
if (preg_match_all($reg, $str, $m)) { // "_all" to match more than one
    $m = array_slice($m[0], -5, 5, true); // last 5 words
    //$m = implode(' ', $m); // uncomment this if you want a string instead of array
    print_r($m);
}

Returns:

Array
(
    [2] => that
    [3] => he
    [4] => <span class="verb">appear</span>ed
    [5] => rather
    [6] => late.
)
Geo
  • 12,666
  • 4
  • 40
  • 55
  • Nice. Works for my specific case. Probably won't work if there were any nested tags (say, if there was something like `appeared`), but luckily that isn't the case for me. – joelproko Jun 03 '15 at 11:03
  • Right. In fact there can be more issues. Remember: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Geo Jun 04 '15 at 18:27
0

A simple way:

preg_match('~^(?:\s*[^>\s]*(?:>[^<]*<[^>\s]*)*){0,5}~', strrev(rtrim($str)), $m);
$result = strrev($m[0]);
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125