9

I'm looking to convert a SQL like statement on the fly to the equivalent regex i.e.

LIKE '%this%'
LIKE 'Sm_th'
LIKE '[C-P]arsen'

What's the best approach to doing this?

P.S. I'm looking to do this on the .Net Framework (C#).

eLRuLL
  • 18,488
  • 9
  • 73
  • 99
Bittercoder
  • 11,753
  • 10
  • 58
  • 76

4 Answers4

10

The following Regex converts an SQL like pattern into a Regex pattern with the help of a MatchEvaluator delegate. It correctly handles square bracket blocks and escapes special Regex characters.

string regexPattern = "^" + Regex.Replace(
    likePattern,
    @"[%_]|\[[^]]*\]|[^%_[]+",
    match =>
    {
        if (match.Value == "%")
        {
            return ".*";
        }
        if (match.Value == "_")
        {
            return ".";
        }
        if (match.Value.StartsWith("[") && match.Value.EndsWith("]"))
        {
            return match.Value;
        }
        return Regex.Escape(match.Value);
    }) + "$";
Nathan Baulch
  • 20,233
  • 5
  • 52
  • 56
  • If you use the regex generated by this code, "hello%" will match "Say hello" which isn't correct. – Jim Berg Aug 18 '21 at 07:37
  • Good point @JimBerg, I've added start/end anchors. – Nathan Baulch Aug 19 '21 at 10:52
  • I used your answer, but put in a special check for when the % is at the beginning or end or both. The SQL expression is the opposite of Regex. If you want '%hello%', it translates to just 'hello'. 'hello' translates to '^hello$'. 'hello%' is '^hello' and '%hello' is 'hello$'. I don't know if anyone ever uses the % in the middle like 'hello%world' which will match anything that starts with hello and ends with world. Your answer does seem to deal with that properly. It's good to see you're still active after 9 years. :-) – Jim Berg Aug 20 '21 at 15:51
3

In addition to @Nathan-Baulch's solution you can use the code below to also handle the case where a custom escape character has been defined using the LIKE '!%' ESCAPE '!' syntax.

   public Regex ConvertSqlLikeToDotNetRegex(string regex, char? likeEscape = null)
   {
        var pattern = string.Format(@"
            {0}[%_]|
            [%_]|
            \[[^]]*\]|
            [^%_[{0}]+
            ", likeEscape);

        var regexPattern = Regex.Replace(
            regex,
            pattern,
            ConvertWildcardsAndEscapedCharacters,
            RegexOptions.IgnorePatternWhitespace);

        regexPattern = "^" + regexPattern + "$";

        return new Regex(regexPattern,
            !m_CaseSensitive ? RegexOptions.IgnoreCase : RegexOptions.None);
    }

    private string ConvertWildcardsAndEscapedCharacters(Match match)
    {
        // Wildcards
        switch (match.Value)
        {
            case "%":
                return ".*";
            case "_":
                return ".";
        }

        // Remove SQL defined escape characters from C# regex
        if (StartsWithEscapeCharacter(match.Value, likeEscape))
        {
            return match.Value.Remove(0, 1);
        }

        // Pass anything contained in []s straight through 
        // (These have the same behaviour in SQL LIKE Regex and C# Regex)
        if (StartsAndEndsWithSquareBrackets(match.Value))
        {
            return match.Value;
        }

        return Regex.Escape(match.Value);
    }

    private static bool StartsAndEndsWithSquareBrackets(string text)
    {
        return text.StartsWith("[", StringComparison.Ordinal) &&
               text.EndsWith("]", StringComparison.Ordinal);
    }

    private bool StartsWithEscapeCharacter(string text, char? likeEscape)
    {
        return (likeEscape != null) &&
               text.StartsWith(likeEscape.ToString(), StringComparison.Ordinal);
    }
TheEadie
  • 51
  • 3
2

From your example above, I would attack it like this (I speak in general terms because I do not know C#):

Break it apart by LIKE '...', put the ... pieces into an array. Replace unescaped % signs by .*, underscores by ., and in this case the [C-P]arsen translates directly into regex.

Join the array pieces back together with a pipe, and wrap the result in parentheses, and standard regex bits.

The result would be:

/^(.*this.*|Sm.th|[C-P]arsen)$/

The most important thing here is to be wary of all the ways you can escape data, and which wildcards translate to which regular expressions.

% becomes .*
_ becomes .
Martin
  • 5,945
  • 7
  • 50
  • 77
  • One thing to note, is that if the sql like pattern contains \, then you want to escape that in the regex. Regex.Escape would help here – mrwaim Aug 17 '11 at 13:51
-1

I found a Perl module called Regexp::Wildcards. You can try to port it or try Perl.NET. I have a feeling you can write something up yourself too.

Eugene Yokota
  • 94,654
  • 45
  • 215
  • 319