4

I started my search a for a decent Regular Expression engine. It landed me to this page Benchmark of Regex Libraries. I decided to use RE2 because it seems to be the best FSA engine in this list.

My final application will be built using WPF in C#. The regex library will be used more in batch mode. However most of the other Business Logic will be written in C# and thus I am planning on using the RE2 library through C#.

If anyone has done anything similar or just used the RE2 through C# and has some advice or pointers please tell me about it.

Thanks.

Pranav Shah
  • 3,233
  • 3
  • 30
  • 47
  • 2
    Are you sure you need the potential speed boost of using a C++ regex library when C#/.NET comes with a pretty solid one? Seems like premature optimization to me. – Daniel DiPaolo Jan 03 '11 at 16:54
  • 2
    It's not premature, but realized optimization. The number of matches that need to be done on per minute basis require this. In a different enivronment we were using the c++ boos library and there was a separate web-service made using some 3rd party program that just used this c++ web-service. – Pranav Shah Jan 03 '11 at 17:10
  • 2
    Still, I notice that the .NET engine is not in that list. At the very least you should compare it against RE2, from C#. – H H Jan 03 '11 at 17:40
  • 1
    It would be sensible to AB the .Net Regex engine before dismissing it. I would be very interested to know the results. – Tim Lloyd Jan 04 '11 at 01:30
  • Re2 is not just offering a speed boost. It's also the only regex library I know of which can guarantee polynomial execution time for a sensible subset of expressions. All others use unrestrictable backtracking and thus present a DOS threat in scenarios where they take regexes from unsafe sources. – John Feb 24 '22 at 10:45
  • I’m voting to close this question because it is better suited to Code Review. – Adrian Mole Aug 19 '22 at 17:09

2 Answers2

2

I have used RE2 .Net in an application, and let me tell you that this is best regex filtering tool ever. It has given me more than 10x improvement in the performance in most cases.

You can download the source code which is in C++ and the .net libraries here, https://github.com/0xcb/Re2.Net

It depends on your requirement though. If you have a text file and if the need is to filter the file using a list of regular expressions then I would recommend GREP. If you are trying to filter a large data set using a huge set of regex's where performance is a concern, then you could go with RE2. But, beware of the limitations in the syntax which are listed in https://code.google.com/p/re2/wiki/Syntax.

Srivathsal
  • 51
  • 1
  • 8
1

Yeah, I'm with Daniel on this one . . . before I'd go hunting from some rogue implementation of Reg Exp . . . I'd make sure I was pre-compiling regular expressions where I could, that greedy options, etc were all set correctly and that the reg expression themselves were not "dumb" in some way . . .

I suspect replacing the standard package is not the optimal solution. Of course without knowing more about your requirements it is hard to know for sure . . . but if the impact of the RegExp package is that huge on your performance, I'd look at the RegExp's themselves first.

Frank

Frank Merrow
  • 949
  • 1
  • 8
  • 19
  • 1
    Based on the remarks here I have looked into compiled Regular Expressions in .NET. As soon as I have the chance I am going to run a test comparing the .NET with RE2. I will post the results here. – Pranav Shah Jan 04 '11 at 15:43