I have some text data in an SQL Server 2014 table in which I want to detect complex patterns and extract certain portions of the text if the text matches the pattern. Because of this, I need capturing groups.
E.g.
From the text
"Some title, Some Journal name, vol. 5, p. 20-22"
I want to grab the volume number
, vol\. ([0-9]+), p\. [0-9]+
Mind that I have simplified this use-case to improve readability. The above use-case could be solved without capturing groups. The actual use-case handles a lot more exceptions, like:
- The journal/title containing "vol.".
- Volume numbers/pages containing letters
- "vol" being followed by ":" or ";" instead of "."
- ...
The actual regex I use is the following (yet, this is not a question on regex structure, just elaborating on why I need capturing groups).
(^|§|[^a-z0-9])vol[^a-z0-9]*([a-z]?[0-9]+[a-z]?)
As far as I know, there are two ways of getting Regex functionality into SQL Server.
- Through CLR: https://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/ . Yet, this example (from 2009) does not support groups. Are there any commonly used solutions out there that do?
- By installing Master Data Services
Since installing and setting up the entire Master Data Services package felt like overkill to get some Regex functionality, I was hoping there'd be an easy, common way out...