1

I've been working with a C# regular expression which is used heavily as part of a custom templating system in a web application. The expression is complex, and I have noticed real performance gains from using the Regex.Compiled option. However, the initial cost of compilation is irritating during development, especially during iterative unit testing (this general tradeoff is mentioned here).

One solution I'm currently trying is lazy regex compilation. The idea is that I can get the best of both worlds by creating a compiled version of the Regex in a separate thread and subbing it in when ready.

My question is: is there any reason why this might be a bad idea performance or otherwise? I ask because I'm not sure whether distributing the cost of things like jitting and assembly loading across threads really works (although it appears to from my benchmarks). Here's the code:

public class LazyCompiledRegex
{
    private volatile Regex _regex;

    public LazyCompiledRegex(string pattern, RegexOptions options)
    {
        if (options.HasFlag(RegexOptions.Compiled)) { throw new ArgumentException("Compiled should not be specified!"); }
        this._regex = new Regex(pattern, options);
        ThreadPool.QueueUserWorkItem(_ =>
        {
            var compiled = new Regex(pattern, options | RegexOptions.Compiled);
            // obviously, the count will never be null. However the point here is just to force an evaluation
            // of the compiled regex so that the cost of loading and jitting the assembly is incurred here rather
            // than on the thread doing real work
            if (Equals(null, compiled.Matches("some random string").Count)) { throw new Exception("Should never get here"); }

            Interlocked.Exchange(ref this._regex, compiled);
        });
    }

    public Regex Value { get { return this._regex; } }
}
ChaseMedallion
  • 20,860
  • 17
  • 88
  • 152
  • 1
    Why not use `Lazy`? – leppie Mar 14 '13 at 12:33
  • Can you not somehow leverage the existing Regex cache? http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.cachesize.aspx – spender Mar 14 '13 at 12:33
  • honestly, this might be better suited for [codereview](http://codereview.stackexchange.com/) – hometoast Mar 14 '13 at 12:33
  • What would you do, if background thread hasn't compiled regex yet, and external code wants to get `Value` right now? – Dennis Mar 14 '13 at 12:50
  • @Dennis: note that a non-compiled version is made immediately available for consumption. The compiled version is then switched in later using Interlocked.Exchange. – ChaseMedallion Mar 14 '13 at 13:07
  • @spender in my understanding, the Regex cache is for caching static regex strings used with the static Regex methods. By creating a Regex object, the parsing and "compilation" of the regex is automatically cached. However, the Compiled option goes a step further by generating an IL version of the Regex for additional performance – ChaseMedallion Mar 14 '13 at 13:09
  • @leppie my goal here is to be able to use the Regex BEFORE compilation finishes. Lazy would still make me wait around for a long time the first time the regex is used. – ChaseMedallion Mar 14 '13 at 13:11
  • 1
    +1 for *asking the right question*. I've seen so many other questions/answers where people use the `Compiled` option on simple regexes like `\w+` that they only use once, just because someone told them it's faster. Sometimes they even admit that performance was already okay and that there's no noticeable speed-up, but they keep doing it because there's no noticeable slow-down, either! `` +1 for @leppie's `CompileToAssenmbly` suggestion, too. ;) – Alan Moore Mar 14 '13 at 14:28

1 Answers1

6

It sounds like you want to use Regex.CompileToAssembly as a compile time step.

leppie
  • 115,091
  • 17
  • 196
  • 297
  • Is there an easy way to add this into the build process while still allowing me to easily iterate on my regex (e. g. to add features to our templating language)? – ChaseMedallion Mar 14 '13 at 13:21
  • 1
    @ChaseMedallion: you can write a little command line app and call that as part of the pre/post build step. – leppie Mar 14 '13 at 13:27
  • Or wrap it in an MSBuild task and use directly from a Target. – Ran Jun 28 '13 at 02:36