4

Disclaimer: I do not need another better regular expression without reset groups. I need to understand why the output is different for PHP 5 and PHP 7.

Problem: I tried to use a branch reset group to match a string using PHP which consists of groups of digits separated by different separators.

$string = '12-34-56-78';
$pattern = '/^\d{2}(?|(---)|(-)|(\.)|(\:))\d{2}\1\d{2}\1\d{2}$/';
$matches = preg_match($pattern, $string) === 1;
var_dump($matches);

Unfortunately it works only for PHP < 7. I also checked the version of the libpcre and it's not the source of the problem. The same version of the libpcre returns different results for different PHP versions.

I wasn't able to find any references to what and why changed in PHP 7.

Question: why the output different for PHP 5 and PHP 7? Is it an expected behavior for PHP 7?

Update: seems to be a bug.

Kolyunya
  • 5,973
  • 7
  • 46
  • 81
  • You do not need any branch reset here because you need to refer to the whole group value. A mere [`^\d{2}(---|[-.:])\d{2}\1\d{2}\1\d{2}$`](https://regex101.com/r/oqj4Ux/1) will work in every version. – Wiktor Stribiżew May 18 '17 at 13:30
  • @WiktorStribiżew thank you again. I know I can use another `regex`. But this is not what I'm asking about. – Kolyunya May 18 '17 at 13:33
  • I think we can reduce this to `$string = '--'; $pattern = '/^(?|(---)|(-)\1)/';` and observe the same behaviour. – Sebastian Proske May 18 '17 at 13:41
  • The whole duplicate group functionality is not working in PHP 7, the `'/(?J)^\d{2}(?|(?---)|(?-)|(?\.)|(?:))\d{2}\g{f}\d{2}\g{f}\d{2}$/'` does not work either. – Wiktor Stribiżew May 18 '17 at 13:42
  • Wouldn't this depend more on the bound `libpcre`/`PCRE_VERSION` than the PHP runtime? -- Also could you make the regex `/x` more spacey here - for decorative purposes? – mario May 18 '17 at 14:15
  • @mario I'm sorry I don't get it how to make it more spacey. I think the pattern should be as simple as possible. – Kolyunya May 18 '17 at 14:19
  • @mario I also checked the version of the `libpcre` and no, it's not the source of the problem. https://3v4l.org/ndQQ8 – Kolyunya May 18 '17 at 14:33
  • Interesting but without modifying branch reset constructor, this works [`^\d+(?|(---)|(-)|(\.)|(\:))\d+\1\d+\1\d+$`](https://3v4l.org/OmkaN) – revo May 25 '17 at 17:40

2 Answers2

3

Are you sure preg_match returns 0 and not FALSE ?

Edit

Still don't know why but inverting (-) and (---) solve the problem :

/^\d{2}(?|(-)|(---)|(\.)|(\:))\d{2}\1\d{2}\1\d{2}$/

Demo

Edit 2

With PHP 7, it seems only the first subpattern works. The regex will also be falsy with the following code :

$string = '12.34.56.78'; 
$string = '12:34:56:78'; 
$string = '12---34---56---78'; 

Maybe a PCRE bug because branch reset synatx seems correct to me.

Stephane Janicaud
  • 3,531
  • 1
  • 12
  • 18
0

It was a bug and it was fixed.

Kolyunya
  • 5,973
  • 7
  • 46
  • 81