I am currently updating very old script written for PHP 5.2.17 for PHP 8.1.2. There is a lot of text processing code blocks and almost all of them are preg_match/preg_match_all. I used to know, that strpos for string matching have always been faster than preg_match, but I decided to check one more time.
Code was:
$c = file_get_contents('readme-redist-bins.txt');
$start = microtime(true);
for ($i=0; $i < 1000000; $i++) {
strpos($c, '[SOMEMACRO]');
}
$el = microtime(true) - $start;
exit($el);
and
$c = file_get_contents('readme-redist-bins.txt');
$start = microtime(true);
for ($i=0; $i < 1000000; $i++) {
preg_match_all("/\[([a-z0-9-]{0,100})".'[SOMEMACRO]'."/", $c, $pma);
}
$el = microtime(true) - $start;
exit($el);
I took readme-redist-bins.txt file which comes with php8.1.2 distribution, about 30KB.
Results(preg_match_all):
PHP_8.1.2: 1.2461s
PHP_5.2.17: 11.0701s
Results(strpos):
PHP_8.1.2: 9.97s
PHP_5.2.17: 0.65s
Double checked... Tried Windows and Linux PHP builds, on two machines.
Tried the same code with small file(200B)
Results(preg_match_all):
PHP_8.1.2: 0.0867s
PHP_5.2.17: 0.6097s
Results(strpos):
PHP_8.1.2: 0.0358s
PHP_5.2.17: 0.2484s
And now the timings is OK.
So, how cant it be, that preg_match is so match faster on large text? Any ideas?
PS: Tried PHP_7.2.10 - same result.