8

Is there a way to simulate the LIKE operator of SQL in PHP with the same syntax? (% and _ wildcards and a generic $escape escape character)? So that having:

$value LIKE $string ESCAPE $escape

you can have a function that returns the PHP evaluation of that without using the database? (consider that the $value, $string and $escape values are already set).

Kermit
  • 33,827
  • 13
  • 85
  • 121
Shoe
  • 74,840
  • 36
  • 166
  • 272
  • 1
    In PCRE, `.` == `_` and `.*` == `%`, so you can go from there. – DaveRandom Jul 11 '12 at 14:01
  • @DaveRandom, thanks, could you make me an example of the complete regex of "SUSA_AND%WITH!%" with `!` as escape character? – Shoe Jul 11 '12 at 14:04
  • I'm just trying to work out a safe way to build a regex dynamically from an expression. The expression you want for that instance is `/SUSA.AND.*?WITH/` because you have not used your escape character, but you presumably need to build the expr dynamically. I have realised that non-greedy quanitifiers will probably be necessary to mimic SQL behaviour exactly, hence the `?` – DaveRandom Jul 11 '12 at 14:06
  • @DaveRandom, I have used my escape character. It's at the end. – Shoe Jul 11 '12 at 14:13
  • 2
    Oh right yeh, sorry totally missed it. Well then it would be `/SUSA.AND.*?WITH%/`since `%` is not a meta character in PCRE. @nickb's answer looks promising and it looks like he's working on improving it to a full working solution, but I'll have a go myself so we can see who's better. (It's me) (Just kidding) – DaveRandom Jul 11 '12 at 14:23
  • @DaveRandom, waiting for you answer too then :) – Shoe Jul 11 '12 at 14:27
  • 1
    I'm having some great fun with this, I'll let you know when I have something I'm happy with :-) – DaveRandom Jul 11 '12 at 15:17

4 Answers4

5

OK, after much fun and games here's what I have come up with:

function preg_sql_like ($input, $pattern, $escape = '\\') {

    // Split the pattern into special sequences and the rest
    $expr = '/((?:'.preg_quote($escape, '/').')?(?:'.preg_quote($escape, '/').'|%|_))/';
    $parts = preg_split($expr, $pattern, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

    // Loop the split parts and convert/escape as necessary to build regex
    $expr = '/^';
    $lastWasPercent = FALSE;
    foreach ($parts as $part) {
        switch ($part) {
            case $escape.$escape:
                $expr .= preg_quote($escape, '/');
                break;
            case $escape.'%':
                $expr .= '%';
                break;
            case $escape.'_':
                $expr .= '_';
                break;
            case '%':
                if (!$lastWasPercent) {
                    $expr .= '.*?';
                }
                break;
            case '_':
                $expr .= '.';
                break;
            default:
                $expr .= preg_quote($part, '/');
                break;
        }
        $lastWasPercent = $part == '%';
    }
    $expr .= '$/i';

    // Look for a match and return bool
    return (bool) preg_match($expr, $input);

}

I can't break it, maybe you can find something that will. The main way in which mine differs from @nickb's is that mine "parses"(ish) the input expression into tokens to generate a regex, rather than converting it to a regex in situ.

The first 3 arguments to the function should be fairly self explanatory. The fourth allows you to pass PCRE modifiers to affect the final regex used for the match. The main reason I put this in is to allow you to pass i so it is case insensitive - I can't think of any other modifiers that will be safe to use but that may not be the case. Removed per comments below

Function simply returns a boolean indicating whether the $input text matched the $pattern or not.

Here's a codepad of it

EDIT Oops, was broken, now fixed. New codepad

EDIT Removed fourth argument and made all matches case-insensitive per comments below

EDIT A couple of small fixes/improvements:

  • Added start/end of string assertions to generated regex
  • Added tracking of last token to avoid multiple .*? sequences in generated regex
DaveRandom
  • 87,921
  • 11
  • 154
  • 174
  • I think LIKE is case insensitive by default. So it would be better to just delete the last parameter and add a `i`. – Shoe Jul 11 '12 at 16:04
  • @Jeffrey I think it depends on the collation of the table/column, hang on will check... – DaveRandom Jul 11 '12 at 16:05
  • Also `var_dump(preg_sql_like('#%ST/_IN_/%', '#ANYCHARS HERE TEST_ING%', '/'));` gives false, while it should be true. – Shoe Jul 11 '12 at 16:08
  • You have the input and expression strings the wrong way round. [Works if you swap the arguments](http://codepad.viper-7.com/UmoLQd). And you are right about the case sensitivity it would seem, I will edit the code above. – DaveRandom Jul 11 '12 at 16:10
  • I think the escape character is also optional. How to make it so that if I don't want an escape char then no one is searched? – Shoe Jul 11 '12 at 16:36
  • Well in SQL (MySQL at least) there is always an escape character, if you don't specify one then \ is assumed, which is why the third argument has a default value of `'\\'` (Damn markdown!) Ref: [`If you do not specify the ESCAPE character, “\” is assumed.`](http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html#operator_like). I can't see how it work work without *an* escape character to be honest, what would you do if you want a literal wildcard? If you want a literal escape character, you escape it (like PHP) and this is accounted for in the above code. – DaveRandom Jul 11 '12 at 16:39
  • 1
    +1 Another great answer :) - This turned out to be a more interesting problem then I initially anticipated. – nickb Jul 11 '12 at 16:48
  • 1
    Another bigger difference is your wildcards are static - For some reason I created mine with generic wildcards, and that made the implementation more.. err, verbose. Clearly, when implementing strictly SQL LIKE, you don't need dynamic wildcards. – nickb Jul 11 '12 at 16:56
  • @nickb I did wonder where you were going with that, although I suppose it does leave room for future expansion of the SQL standard. Although TBH, if you're going to do that, you might as well just let the users specify full PCRE and be done with it. – DaveRandom Jul 11 '12 at 17:00
  • 1
    @Jeffrey A couple of small fixes above – DaveRandom Jul 11 '12 at 18:35
4

This is basically how you would implement something like this:

$input = '%ST!_ING_!%';
$value = 'ANYCHARS HERE TEST_INGS%';

// Mapping of wildcards to their PCRE equivalents
$wildcards = array( '%' => '.*?', '_' => '.');

// Escape character for preventing wildcard functionality on a wildcard
$escape = '!';

// Shouldn't have to modify much below this

$delimiter = '/'; // regex delimiter

// Quote the escape characters and the wildcard characters
$quoted_escape = preg_quote( $escape);
$quoted_wildcards = array_map( function( $el) { return preg_quote( $el); }, array_keys( $wildcards));

// Form the dynamic regex for the wildcards by replacing the "fake" wildcards with PRCE ones
$temp_regex = '((?:' . $quoted_escape . ')?)(' . implode( '|', $quoted_wildcards) . ')';

// Escape the regex delimiter if it's present within the regex
$wildcard_replacement_regex = $delimiter . str_replace( $delimiter, '\\' . $delimiter, $temp_regex) . $delimiter;

// Do the actual replacement
$regex = preg_replace_callback( $wildcard_replacement_regex, function( $matches) use( $wildcards) { return !empty( $matches[1]) ? preg_quote( $matches[2]) : $wildcards[$matches[2]]; }, preg_quote( $input)); 

// Finally, test the regex against the input $value, escaping the delimiter if it's present
preg_match( $delimiter . str_replace( $delimiter, '\\' . $delimiter, $regex) . $delimiter .'i', $value, $matches);

// Output is in $matches[0] if there was a match
var_dump( $matches[0]);

This forms a dynamic regex based on $wildcards and $escape in order to replace all "fake" wildcards with their PCRE equivalents, unless the "fake" wildcard character is prefixed with the escape character, in which case, no replacement is made. In order to do this replacement, the $wildcard_replacement_regex is created.

The $wildcard_replacement_regex looks something like this once everything's all said and done:

/((?:\!)?)(%|_)/

So it uses two capturing groups to (optionally) grab the escape character and one of the wildcards. This enables us to test to see if it grabbed the escape character in the callback. If it was able to get the escape character before the wildcard, $matches[1] will contain the escape character. If not, $matches[1] will be empty. This is how I determine whether to replace the wildcard with its PCRE equivalent, or leave it alone by just preg_quote()-ing it.

You can play around with it at codepad.

nickb
  • 59,313
  • 13
  • 108
  • 143
  • Not bad but ** and you would need to chuck a couple of `preg_quote()`s in there I think, if `$escape` or `$wildcards` contain PCRE meta characters – DaveRandom Jul 11 '12 at 14:08
  • Oh and I think you want a non-greedy `.*` (i.e. `.*?`) in order to truly mimic SQL behaviour. – DaveRandom Jul 11 '12 at 14:10
  • Thanks for the feedback @DaveRandom - I just wrote this out in the answer box and now I'm actually trying to run it and fixing all the bugs. I'm going to incorporate your input in the next revision of my answer! Thanks again. – nickb Jul 11 '12 at 14:14
  • @nickb, also thanks for taking your time for this answer. I'm looking forward the complete answer. Take your time. – Shoe Jul 11 '12 at 14:22
  • @Jeffrey - I've updated my answer. Currently I think it's working quite well, minus the few issues I brought up in my post. – nickb Jul 11 '12 at 14:37
  • 'a' LIKE 'a' returns false in SQL and true here. – Shoe Jul 11 '12 at 14:47
  • 1
    @Jeffrey - In SQL, [it also returns true](http://sqlfiddle.com/#!2/b17f9/1). Unless I'm missing something? I can even query that out directly, and [it still is true](http://sqlfiddle.com/#!2/b17f9/2). – nickb Jul 11 '12 at 14:51
  • @nickb, no you are right, it returns true also in SQL I just tested it out. Wow, I've always knew that would return false for some strange reason... – Shoe Jul 11 '12 at 14:55
  • Giving `/` as escape character it triggers a warning error `Warning: preg_replace_callback(): Unknown modifier ')'`. – Shoe Jul 11 '12 at 15:07
  • Yep, that's the last bug, as it matches the regex delimiter. Still thinking about that one, maybe somebody else has a flexible way to deal with that. – nickb Jul 11 '12 at 15:09
  • What about `$delimiters = ($escape == '/') ? '#' : '/';` If I do remember correctly you can use any char as delimiters. – Shoe Jul 11 '12 at 15:11
  • You can, but if the delimiter appears anywhere else in the regex (as in, it's a wildcard character, in the input, or the escape character), then the regex will fail. Just had an idea - Going to update my answer. – nickb Jul 11 '12 at 15:13
  • @Jeffrey - This turned out to be an interesting problem. I've updated my answer to what I believe will be the final version. [Here is a link](http://viper-7.com/yWh6OS) to a demo showing that it is now possible to set the escape character the same as the regex delimiter. It might not work 100%, but it is close, and you should be able to build off it to suit your needs. – nickb Jul 11 '12 at 15:22
  • @nickb, very good job. I promised DaveRandom that I would have checked his answer too before deciding what answer to choose, but you totally deserved a + 1. Thank you very much. – Shoe Jul 11 '12 at 15:29
  • 1
    I have to admit, I can't break this. I can't help feeling you would do better to use the second argument to `preg_quote()` instead of the `str_replace()`s though... ;-) – DaveRandom Jul 11 '12 at 16:04
  • @DaveRandom - But if you do that, won't you end up escaping what you don't want? For example, you just replaced the fake wildcards with their equivalents. If you ran that whole string through `preg_quote()`, you'd end up quoting the PCRE replacements, I think. That's why I had to use `str_replace()`. – nickb Jul 11 '12 at 16:14
  • No I don't mean the whole generated expression, I mean as you generate it. But I do now see what you mean in the context of your code, and I think I've found a hole in it - [consider this](http://viper-7.com/vlLPjO) - you are not `preg_quote()`ing the parts of the search pattern that are not wildcards. – DaveRandom Jul 11 '12 at 16:30
  • @DaveRandom - I'm not seeing what's wrong, can you explain it a little better? It looks like it's properly matching, but then again I've been staring at this too long. – nickb Jul 11 '12 at 16:46
  • In the link I gave, I put a `.` in the expression, and a `j` in the same place in the string to match against. In PCRE this matches (`.` = any char) but in SQL it wouldn't (`.` = literal `.`) – DaveRandom Jul 11 '12 at 16:48
  • I see it now! Great find. All that needs to be changed is to add `preg_quote( $input)` instead of `$input` to the call to `preg_replace_callback`. Just fixed it. – nickb Jul 11 '12 at 16:53
1

You can use regexp, for example: preg_match.

Piotr Olaszewski
  • 6,017
  • 5
  • 38
  • 65
1

The other examples were a bit too complex for my taste (and painful to my clean code eyes), so I reimplemented the functionality in this simple method:

public function like($needle, $haystack, $delimiter = '~')
{
    // Escape meta-characters from the string so that they don't gain special significance in the regex
    $needle = preg_quote($needle, $delimiter);

    // Replace SQL wildcards with regex wildcards
    $needle = str_replace('%', '.*?', $needle);
    $needle = str_replace('_', '.', $needle);

    // Add delimiters, beginning + end of line and modifiers
    $needle = $delimiter . '^' . $needle . '$' . $delimiter . 'isu';

    // Matches are not useful in this case; we just need to know whether or not the needle was found.
    return (bool) preg_match($needle, $haystack);
}

Modifiers:

  • i: Ignore casing.
  • s: Make dot metacharacter match anything, including newlines.
  • u: UTF-8 compatibility.
Marco Roy
  • 4,004
  • 7
  • 34
  • 50