1

I've done a ton of Googling on this and can't find the answer. Or, at least, not the answer I am hoping to find. I am attempting to convert a REGEXP_SUBSTR search from Teradata into T-SQL on SQL Server 2016.

This is the way it is written in Teradata:

REGEXP_SUBSTR(cn.CONTRACT_PD_AOR,'\b([a-zA-Z]{2})-([[:digit:]]{2})-([[:digit:]]{3})(-([a-zA-Z]{2}))?\b')

The numbers in the curly brackets specify the number of characters that can match the specific REGEXP. So, this is looking for a contract number that look like this format: XX-99-999-XX

Is this not possible in T-SQL? Specifying the amount of characters to look at? So I would have to write something like this:

where CONTRACT_PD_AOR like '[a-zA-Z][a-zA-Z]-[0-9][0-9]-[0-9][0-9][0-9]-[a-zA-Z][a-zA-Z]%'

Is there not a simpler way to go about it?

PM 77-1
  • 12,933
  • 21
  • 68
  • 111
WannabeDev
  • 15
  • 4
  • SQL Server doesn't support Regex, so the short answer is "no". The long answer is that is you need true Regex support, you'll need to invest in CLR functions. – Thom A Feb 06 '23 at 22:01
  • Man, that's crazy. Thanks for the info! – WannabeDev Feb 06 '23 at 22:15
  • IMHO your best move is to migrate from SQL Server to Postgres. Not only will you save money, you'll enjoy a modern, fully-functional database. – Bohemian Feb 06 '23 at 23:18
  • Potential background: I believe the SQL Server LIKE operator was intentionally limited to expressions that can be evaluated in linear time with a single pass through the string being checked. No generated finite-state-machine. No backtracking. No lookahead. No repetition qualifiers. No Nesting. Nothing that could make performance unpredictable. (And no reference to support my belief.) – T N Feb 06 '23 at 23:50

1 Answers1

0

While not an answer, with this method it makes things a little less panful. This is a way to set a format and reuse it if you'll need it multiple times in your code while keeping it clean and readable.

Set a format variable at the top, then do the needed replaces to build it. Then use the format name in the code. Saves a little typing, makes your code less fugly, and has the benefit of making that format variable reusable should you need it in multiple queries without all that typing.

Declare @fmt_CONTRACT_PD_AOR nvarchar(max) = 'XX-99-999-XX';
Set @fmt_CONTRACT_PD_AOR = REPLACE(@fmt_CONTRACT_PD_AOR, '9', '[0-9]');
Set @fmt_CONTRACT_PD_AOR = REPLACE(@fmt_CONTRACT_PD_AOR, 'X', '[a-zA-Z]');

with tbl(str) as (
select 'AA-23-234-ZZ' union all
select 'db-32-123-dd' union all
select 'ab-123-88-kk'
)
select str from tbl
where str like @fmt_CONTRACT_PD_AOR;
Gary_W
  • 9,933
  • 1
  • 22
  • 40
  • Nice work around. But there are many other formats that it is searching for and new formats are added somewhat frequently. I guess I could just declare and set as many variables as we are searching for. – WannabeDev Feb 06 '23 at 23:36