4

I have some text data in an SQL Server 2014 table in which I want to detect complex patterns and extract certain portions of the text if the text matches the pattern. Because of this, I need capturing groups.

E.g.
From the text

"Some title, Some Journal name, vol. 5, p. 20-22"

I want to grab the volume number

, vol\. ([0-9]+), p\. [0-9]+

Mind that I have simplified this use-case to improve readability. The above use-case could be solved without capturing groups. The actual use-case handles a lot more exceptions, like:

  • The journal/title containing "vol.".
  • Volume numbers/pages containing letters
  • "vol" being followed by ":" or ";" instead of "."
  • ...

The actual regex I use is the following (yet, this is not a question on regex structure, just elaborating on why I need capturing groups).

(^|§|[^a-z0-9])vol[^a-z0-9]*([a-z]?[0-9]+[a-z]?)

As far as I know, there are two ways of getting Regex functionality into SQL Server.

Since installing and setting up the entire Master Data Services package felt like overkill to get some Regex functionality, I was hoping there'd be an easy, common way out...

Wouter
  • 1,829
  • 3
  • 28
  • 34
  • 2
    " I was hoping there'd be an easy, common way out" - yes CLR. .NET regular expressions support capturing groups so write the code you need and deploy it. – Martin Smith Sep 09 '16 at 08:15
  • @MartinSmith: That's already good to know. I'm new to SQL Server (coming from Oracle) and new to .NET. I don't even know yet what "CLR" *is* aside from that it can help me use the .NET regex functionality. Hence I was wondering if there are any solutions available that I don't have to code myself. Like... plug and play. Since I'm assuming I'm not the first person that needs this functionality. – Wouter Sep 09 '16 at 08:41
  • @Wouter to learn more about working with SQLCLR in general, please see the series I am writing on SQL Server Central on this topic: [Stairway to SQLCLR](http://www.sqlservercentral.com/stairway/105855/) (FYI: free registration is required to read content on that site, but it's worth it :-). – Solomon Rutzky May 22 '17 at 20:33

1 Answers1

3

I have found a CLR implementation that is super easy to install, and includes Regex capturing group functions.

http://www.sqlsharp.com/

I have installed this in a separate database called 'SQL#' (simply by using the provided installation .sql script), and the functions are located inside a schema with the same name. As a result I can use the function as follows:

select SQL#.SQL#.RegEx_CaptureGroup( 'test (2005) test', '\((20[012][0-9]|19[5-9][0-9])\)', 1, NULL, 1, -1, '');

Would be nice if this was included by default in SQL Server...

Wouter
  • 1,829
  • 3
  • 28
  • 34
  • 2
    Hello. I'm the creator of [SQL#](http://www.SQLsharp.com/) and wanted to mention a few things: 1) I recently released v 4.0 which includes many performance enhancements, including **RegEx_CaptureGroup**, 2) there is now a **RegEx_CaptureGroupCapture** function to get a specific repeat of a specific group, 3) Full version (i.e. not free) includes TVFs **RegEx_CaptureGroups** and **RegEx_CaptureGroupCaptures** to get all instances within a string, and 4) when working with <= 4k characters, use the 4k versions of RegEx and String functions for much better performance: **RegEx_CaptureGroup4k** – Solomon Rutzky May 22 '17 at 20:19