I am using preg_match to find and remove evaled base64 encoded viruses within files.
the regex bewlow:
/\s*eval\s*\(\s*base64_decode\s*\(\s*('[a-zA-Z0-9\+\/]*={0,2}'|"[a-zA-Z0-9\+\/]*={0,2}")\s*\)\s*\s*\)\s*(;)?\s*/
matches the following code:
eval(base64_decode("BASE64+ENCODED+VIRUS+HERE"));
The above regex works fine.
I wanted to match base64 strings word-wrapped via concatenations. So it should match the following as well "BASE64+EN" . "CODED+VIRUS+HERE".
So I changed the regex into:
/\s*eval\s*\(\s*base64_decode\s*\(\s*\'([a-zA-Z0-9\+\/]*(\'\s*\.\s*\')?[a-zA-Z0-9\+\/]*)*={0,2}\'|"([a-zA-Z0-9\+\/]*("\s*\.\s*")?[a-zA-Z0-9\+\/]*)*={0,2}"\s*\)\s*\s*\)\s*(;)?\s*/
Which finds a partial match for:
"BASE64+ENCODED+VIRUS+HERE"));
But when I try to apply the match on this entire file: http://pastebin.com/ED8sFUP0 the page dies with browser message "The connection to the server was reset while the page was loading.".
I have error reporting activated:
error_reporting(E_ALL);
ini_set('display_errors', TRUE);
ini_set('scream.enabled', TRUE);
But nothing shows up not here and not in apache's error logs either.
The very same regex when used on files that do not contain the offending string works as expected; preg_match does not return boolean false it returns 0 meaning that there was no regex error and that it did not find any matches.
My concern is not necessarily why the regex finds only a partial match. That's probably some typo I made that happens to work.
I want to know when and how does the regex compiler fail break the entire process chain
apache > php > regex_compiler
I understand that it may very well be "because" of my regex that just happens compile correctly but not match correctly. And it might cause something bad down the road. But my interest is why the regex compiler fails with no error and how I can get the error message that is should be yielding.
Something similar is discussed but unresolved here: php preg_match_all kills page for unknown reason