2

I need to do string replaces... there are only a few cases I need to handle:

1) optional case insensitive
2) optional whole words

Right now I'm using _myRegEx.Replace()... if #1 is specified, I add the RegexOptions.IgnoreCase flag. If #2 is specified, I wrap the search word in \b<word>\b.

This works fine, but its really slow. My benchmark takes 1100ms vs 90ms with String.Replace. Obviously some issues with doing that:

1) case insensitive is tricky
2) regex \b<word>\b will handle "<word>", " <word>", "<word> " and " <word> "... string replace would only handle " <word> ".

I'm already using the RegexOptions.Compiled flag.

Any other options?

SledgeHammer
  • 7,338
  • 6
  • 41
  • 86
  • @M.kazemAkhgary, I am not using static, I'm using the instance... I have updated the post to clarify. – SledgeHammer Dec 29 '16 at 00:30
  • 1
    how do you instantiate the `Regex` instance? there is known slowness due some regex pattern, [like this](http://stackoverflow.com/questions/9687596/slow-regex-performance) and even microsoft give a [best practice on using `Regex`](https://msdn.microsoft.com/en-us/library/gg578045(v=vs.110).aspx). – Bagus Tesa Dec 29 '16 at 01:37
  • @BagusTesa just something simple like new Regex("\bTest\b", RegexOptions.Compiled | RegexOptions.IgnoreCase); for cases 1 & 2. – SledgeHammer Dec 29 '16 at 04:01
  • i wonder if its anything to do with the `RegexOptions.IgnoreCase` - [this SOF question](http://stackoverflow.com/questions/10573143/is-it-faster-to-compare-strings-with-regex-with-ignorecase-or-with-tolower-metho) - try to fiddle with it first, i can't think of anything else.. perhaps someone else can help. – Bagus Tesa Dec 29 '16 at 04:25

1 Answers1

3

You can get a noticeable improvement in this case if you don't use a compiled regex. Honestly, this isn't the first time I measure regex performance and find the compiled regex to be slower, even if used the way it's supposed to be used.

Let's replace \bfast\b with 12345 in a string a million times, using four different methods, and time how long this took - on two different PCs:

var str = "Regex.Replace is extremely FAST for simple replacements like that";
var compiled = new Regex(@"\bfast\b", RegexOptions.IgnoreCase | RegexOptions.Compiled);
var interpreted = new Regex(@"\bfast\b", RegexOptions.IgnoreCase);
var start = DateTime.UtcNow;
for (int i = 0; i < 1000000; i++)
{
    // Comment out all but one of these:
    str.Replace("FAST", "12345"); // PC #1: 208 ms, PC #2: 339 ms
    compiled.Replace(str, "12345"); // 1100 ms, 2708 ms
    interpreted.Replace(str, "12345"); // 788 ms, 2174 ms
    Regex.Replace(str, @"\bfast\b", "12345", RegexOptions.IgnoreCase); // 1076 ms, 3138 ms
}
Console.WriteLine((DateTime.UtcNow - start).TotalMilliseconds);

Compiled regex is consistently one of the slowest ones. I don't observe quite as big a difference between string.Replace and Regex.Replace as you do, but it's in the same ballpark. So try it without compiling the regex.

Also worth noting is that if you had just one humongous string, Regex.Replace is blazing fast, taking about 7ms for 13,000 lines of Pride and Prejudice on my PC.

Roman Starkov
  • 59,298
  • 38
  • 251
  • 324
  • I'm not doing it once on a "BAT" (big a$$ text). I'm doing it repeatedly on a single line of text. For example, take the first sentence of your reply and do a whole word replace on "that" for 1M iterations (obviously in my case its different single lines, I'm just giving you an example). I can't batch lines together. They need to be done one at a time. – SledgeHammer Dec 29 '16 at 15:18
  • @SledgeHammer Answer updated. Try it without compiling the regex. Still, in my example Regex is about 4x-7x slower than string.Replace and you probably can't do much about that. – Roman Starkov Dec 29 '16 at 19:34
  • Wow... that is weird! Compiled regex is supposed to be faster. Maybe its just for more complex regexs (or if you use regex chars?)?? but yeah, I'm seeing the same result as you... compiled = 370ms, non-compiled=205ms. – SledgeHammer Dec 29 '16 at 19:48
  • Ok, but check this out: keep your str as is, but get rid of the \b's (no whole word since thats optional), just have a pattern of "like"... for 1M reps, compiled=1100ms, interpreted=728ms... but a simple string.replace is only 205ms. – SledgeHammer Dec 29 '16 at 19:57
  • What I might try doing is, since case insensitive and whole words are optional, I'll experiment with with technique is best for each case and handle them that way... i.e. no options use String.Replace, for whole words, use regex, etc., but taking off the compiled flag definitely speeds it up on this type of pattern. – SledgeHammer Dec 29 '16 at 19:58
  • 4
    I tested with some other patterns out of curiousity, and once you start using regex symbols, even with a simple pattern like "\\d*[|]", the compiled is faster. – SledgeHammer Dec 29 '16 at 20:08