T-SQL Regex for social security number (SQL Server 2008 R2)

Question

I need to find invalid social security numbers in a varchar field in a SQL Server 2008 database table. (Valid SSNs are being defined by being in the format ###-##-#### - doesn't matter what the numbers are, as long as they are in that "3-digit dash 2-digit dash 4-digit" pattern.

I do have a working regex:

SELECT * 
FROM mytable
WHERE ssn NOT LIKE '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]'

That does find the invalid SSNs in the column, but I know (okay - I'm pretty sure) that there is a way to shorten that to indicate that the previous pattern can have x iterations.

I thought this would work:

'[0-9]{3}-[0-9]{2}-[0-9]{4}'

But it doesn't.

Is there a shorter regex than the one above in the select, or not? Or perhaps there is, but T-SQL/SQL Server 2008 doesn't support it!?

TSQL doesn't have native Regex support. It supports `LIKE` which you're using but that's a very simple language - not really regex at all. A better plan might be to store just 9 digits (quite simple to create constraints for that) and just insert the dashes when *displaying* the SSN - unless you actually need to support storing invalid data. — Damien_The_Unbeliever, Aug 03 '15 at 13:09

score 2 · Accepted Answer · answered Aug 03 '15 at 13:43

If you plan to get a shorter variant of your LIKE expression, then the answer is no.

In T-SQL, you can only use the following wildcards in the pattern:

%
- Any string of zero or more characters. WHERE title LIKE '%computer%' finds all book titles with the word computer anywhere in the book title.

_ (underscore)
Any single character. WHERE au_fname LIKE '_ean' finds all four-letter first names that end with ean (Dean, Sean, and so on).
[ ]
Any single character within the specified range ([a-f]) or set ([abcdef]). WHERE au_lname LIKE '[C-P]arsen' finds author last names ending with arsen and starting with any single character between C and P, for example Carsen, Larsen, Karsen, and so on. In range searches, the characters included in the range may vary depending on the sorting rules of the collation.
[^]
Any single character not within the specified range ([^a-f]) or set ([^abcdef]).

So, your LIKE statement is already the shortest possible expression. No limiting quantifiers can be used (those like {min,max}), not shorthand classes like \d.

If you were using MySQL, you could use a richer set of regex utilities, but it is not the case.

Okay. Thanks for the explanation, stribizhev. – marky Aug 03 '15 at 14:17 — marky, Aug 03 '15 at 14:17

shA.t · Answer 2 · 2015-08-03T18:18:32.913

0

I suggest you to use another solution like this:

-- Use `REPLICATE` if you really want to use a number to repeat
Declare @rgx nvarchar(max) = REPLICATE('#', 3) + '-' +
                             REPLICATE('#', 2) + '-' +
                             REPLICATE('#', 4);

-- or use your simple format string
Declare @rgx nvarchar(max) = '###-##-####';

-- then use this to get your final `LIKE` string.
Set @rgx = REPLACE(@rgx, '#', '[0-9]');

And you can also use something like '_' for characters then replace it with [A-Z] and so on.

edited Aug 03 '15 at 18:18

answered Aug 03 '15 at 14:13

shA.t

16,580
5
54
111

What advantage does this confer to the OP? – Ben Thul Aug 03 '15 at 16:42

T-SQL Regex for social security number (SQL Server 2008 R2)

2 Answers2

Linked