1

I'm using PATINDEX to validate if a column has invalid special chars. But I'm facing some problems with some chars.

SELECT PATINDEX(N'%[^a-zA-Z0-9 !"&''()*+,-./:;?=%~@[]_{}\|<>]%' collate SQL_Latin1_General_CP850_BIN, 'abc╢123' collate SQL_Latin1_General_CP850_BIN)

The valid chars are ! " & ' ( ) * + , - . / : ; ? = % ~ @ [ ] _ { } \ | < >.

The is not one of the valid chars but PATINDEX still returns 0.

Is there another way to validate that?

Salman A
  • 262,204
  • 82
  • 430
  • 521
jpmo22
  • 77
  • 8

2 Answers2

2

The problem seems to be caused by ] in the pattern. You can use LIKE operator along with ESCAPE. Minimal example:

-- valid characters are digits, "[" and "]"
SELECT val
     , expected
     , CASE WHEN val LIKE '%[^0-9[]]%'             THEN 'Bad' ELSE 'Good' END AS wrong_pattern
     , CASE WHEN val LIKE '%[^0-9[\]]%' ESCAPE '\' THEN 'Bad' ELSE 'Good' END AS right_pattern
FROM (VALUES
    ('12345', 'Good'),
    ('[123]', 'Good'),
    ('ABCDE', 'Bad'),
    ('123\5', 'Bad')
) AS t(val, expected)

So your pattern could be written as:

SELECT CASE WHEN 'abc╢123!"&''()*+,-./:;?=%~@[]_{}\|<>' NOT LIKE '%[^a-z0-9 [\]\\!"&''()*+,\-./:;?=%~@_{}|<>]%' ESCAPE '\' THEN 'Good' ELSE 'Bad' END

I escaped ], - and \. The characters % and _ do not behave as wildcards inside square brackets thus not escaped.

Salman A
  • 262,204
  • 82
  • 430
  • 521
  • Thanks for your response. But how can I escape `_` along with other chars? I need to create a case when with all valid chars? – jpmo22 Nov 19 '19 at 15:03
  • 1
    I am unable to test the pattern in your post but the problem is definitely with `]`. I think you only need to change ``\`` to ``\\`` and `]` to `\]`. – Salman A Nov 19 '19 at 15:05
1

Escaping is one method.
But if you just want to ignore the readable ASCII characters, then the range could be simplified.

[^ -~] : not between space and ~

-- Sample data
declare @T table (col NVARCHAR(30) collate SQL_Latin1_General_CP850_BIN primary key);
insert into @T (col) values
(N'abc╢123'),
(N'xyz123[}'''),
(N'abc௹123');

-- Query
SELECT col, PATINDEX(N'%[^ -~]%' collate  SQL_Latin1_General_CP850_BIN, col) as pos
FROM @T;

Returns:

col         pos
--------    ----
abc╢123     4
abc௹123     4
xyz123[}'   0

But to also locate the caret and some others, it's more complicated.
Since PATINDEX doesn't have ESCAPE as LIKE does.

-- Sample data
declare @T table (
 id int identity(1,1) primary key, 
 col NVARCHAR(30) collate SQL_Latin1_General_CP850_BIN
 );
insert into @T (col) values
 (N'xyz[123]}''') -- good
 ,(N'abc╢123') -- bad
,(N'abc௹123') -- bad
,(N'def#456') -- bad
,(N'def^456') -- bad
;

-- also locate #, ´ , ` and ^
SELECT col, 
CASE 
WHEN PATINDEX(N'%[^ !"$-_a-z{-~]%' collate  SQL_Latin1_General_CP850_BIN, col) > 0
THEN PATINDEX(N'%[^ !"$-_a-z{-~]%' collate  SQL_Latin1_General_CP850_BIN, col) 
ELSE CHARINDEX(N'^' collate  SQL_Latin1_General_CP850_BIN, col)
END AS pos
FROM @T;

Returns:

xyz[123]}'  0
abc╢123     4
abc௹123     4
def#456     4
def^456     4
LukStorms
  • 28,916
  • 5
  • 31
  • 45
  • thank you for your response. But I have tested this expression and in fact, the char `╢` is found in with PATINDEX. I also tried with another char `௹` and your expression is not working. I can still use an invalid char. `SELECT PATINDEX('%[^ -~]%' collate SQL_Latin1_General_CP850_BIN, 'abc௹123')` – jpmo22 Nov 21 '19 at 10:56
  • 1
    Doesn't work for `௹`? Okay... that's odd. Probably something to do with that collation... – LukStorms Nov 21 '19 at 11:07
  • 1
    Ok, `SELECT PATINDEX('%[^ -~]%' collate SQL_Latin1_General_CP850_BIN, N'abc௹123')` does return 4. The varchar `'௹'` translates to '?', while the nvarchar `N'௹'` doesn't. I'll fix my answer. – LukStorms Nov 21 '19 at 11:25
  • Yeah, this fixed your answer. And I also believe that your approach is more efficient than the `LIKE` method. But I also need to block `#`, `^`, `´` and ```. and all of them are in-between `SPACE` and `~`. And I can't find a way to add those restrictions to your code. Is it possible? – jpmo22 Nov 21 '19 at 11:57
  • Aargghh, not being able to use ESCAPE for a PATINDEX is really a pain if you want to include `[` and `]` but not `^`. Just look at that ASCII map. The problem is with also accepting `]`. If the caret would be allowed then using a few more ranges would work for those others : `PATINDEX(N'%[^ !"$-_a-z{-~]%' collate SQL_Latin1_General_CP850_BIN, col)`. – LukStorms Nov 21 '19 at 13:03
  • @jpmo22 Added an extra solution, but I don't really like it... Looks like MS did receive at least one request to allow an escape also for PATINDEX. But it never gained enough support to implement it. Oh well. – LukStorms Nov 21 '19 at 13:40