1

I have a column with the name of a person in the following format: "LAST NAME, FIRST NAME"

  • Only Upper Cases Allowed
  • Space after comma optional

I would like to use a regular expression like: [A-Z]+,[ ]?[A-Z]+ but I do not know how to do this in T-SQL. In Oracle, I would use REGEXP_LIKE, is there something similar for SQL Server 2016?

I need something like the following:

UPDATE table 
SET is_correct_format = 'YES'
WHERE REGEXP_LIKE(table.name,'[A-Z]+,[ ]?[A-Z]+');
user7792598
  • 177
  • 1
  • 6
  • 17
  • 4
    TSQL doesn't support regular expressions. You need to use CLR for this. Or a much more convoluted TSQL expression without regex. You might want to see [Falsehoods Programmers Believe About Names](http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/) though. – Martin Smith Aug 15 '17 at 17:51
  • Not standard in SQL Server. However have a look at [SQL#](https://sqlsharp.com/features/) (see RegEx). – TT. Aug 15 '17 at 18:03
  • Is this for input validation? – S3S Aug 15 '17 at 18:06

2 Answers2

2

First, case sensitivity depends on the collation of the DB, though with LIKE you can specify case comparisons. With that... here is some Boolean logic to take care of the cases you stated. Though, you may need to add additional clauses if you discover some bogus input.

declare @table table (Person varchar(64), is_correct_format varchar(3) default 'NO')
insert into @table (Person)
values
('LowerCase, Here'),
('CORRECTLY, FORMATTED'),
('CORRECTLY,FORMATTEDTWO'),
('ONLY FIRST UPPER, LowerLast'),
('WEGOT, FormaNUMB3RStted'),
('NoComma Formatted'),
('CORRECTLY, TWOCOMMA, A'),
(',COMMA FIRST'),
('COMMA LAST,'),
('SPACE BEFORE COMMA , GOOD'),
(' SPACE AT BEGINNING, GOOD')


update @table
set is_correct_format = 'YES'
where 
        Person not like '%[^A-Z, ]%'                                                    --check for non characters, excluding comma and spaces
    and len(replace(Person,' ','')) = len(replace(replace(Person,' ',''),',','')) + 1   --make sure there is only one comma
    and charindex(',',Person) <> 1                                                      --make sure the comma isn't at the beginning
    and charindex(',',Person) <> len(Person)                                            --make sure the comma isn't at the end
    and substring(Person,charindex(',',Person) - 1,1) <> ' '                            --make sure there isn't a space before comma
    and left(Person,1) <> ' '                                                           --check preceeding spaces
    and UPPER(Person) = Person collate Latin1_General_CS_AS                             --check collation for CI default (only upper cases)

select * from @table
S3S
  • 24,809
  • 5
  • 26
  • 45
  • I like this approach too. Some trivial cases: trailing/preceeding spaces. Multiple spaces after the comma. I'm not sure if 'ONLY FIRST UPPER' should be a valid match. "[A-Z]+," implies no spaces before the comma. – Ryan B. Aug 15 '17 at 18:41
  • Trailing spaces are ignored in comparison operations in SQL Server, so they have no real affect though the leading spaces could be checked for sure. I'll add that in. The multiple spaces I didn't account for in case there was two or three part names, common in the Hispanic culture among others but a good point. – S3S Aug 15 '17 at 18:44
1

The tsql equivalent could look like this. I'm not vouching for the efficiency of this solution.

declare @table as table(name varchar(20), is_Correct_format varchar(5))
insert into @table(name) Values
('Smith, Jon')
,('se7en, six')
,('Billy bob')


UPDATE @table 
SET is_correct_format = 'YES'
WHERE
replace(name, ', ', ',x')
     like (replicate('[a-z]', charindex(',', name) - 1)
         + ','
         + replicate('[a-z]', len(name) - charindex(',', name)) )


select * from @table

The optional space is hard to solve, so since it's next to a legal character I'm just replacing with another legal character when it's there.

TSQL does not provide the kind of 'repeating pattern' of * or + in regex, so you have to count the characters and construct the pattern that many times in your search pattern.

I split the string at the comma, counted the alphas before and after, and built a search pattern to match.

Clunky, but doable.

Ryan B.
  • 3,575
  • 2
  • 20
  • 26