SQL PATINDEX patterns for filtering

Question

I am trying to filter a string to be used for as a subject line for Outlook mail. I use a filter b/c some strings that have special characters will convert a subject line into Unicode which I want to prevent. I am trying to use PATINDEX but can not get brackets into the pattern, does anyone know how to do this and/or have a good site to share on a guide to creating patterns for PATINDEX. Here is what I have:

-- First replaces special characters with similar safe ones.
-- Finally excludes all other characters not specified in the pattern.
-- =============================================

ALTER Function [dbo].[RemoveSpecialCharacters](@Temp VarChar(1000))
Returns VarChar(1000)
AS
BEGIN
    DECLARE @KeepValues as varchar(50)

    SET @Temp = Replace(Replace(Replace(Replace(Replace(Replace(Replace(@Temp,'”','"'),'“','"'),'‘','`'),'’','`'),'–','-'),'[','('),']',')')
    SET @KeepValues = '%[^a-z0-9()."`:;,#&+*\/-]%'

    WHILE PatIndex(@KeepValues, @Temp) > 0
        SET @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    RETURN @Temp
END

As you can see since I could not figure out how to get brackets in the pattern, for now I'm replacing them with parenthesis.

SET @KeepValues = '%[^a-z0-9()."`:;,#&+*\/-[[][]]]%' – George Mastros Mar 07 '14 at 15:52 — George Mastros, Mar 07 '14 at 15:52

Kevin Suchlicki · Answer 1 · 2014-03-07T15:06:02.870

0

SET @KeepValues = '%[^a-z0-9()."`:;,#&+*\/-[]][[]]%'

Left bracket = [[] Right bracket = []]

Edit:

The below avoids PATINDEX completely to accomplish the same thing. Let me know if it works for you.

DECLARE @input VARCHAR(1000) = 'th,is i[s @a] [[.[ te[  st '
        , @Temp VARCHAR(1000) = ''
        , @index INT = 1
        , @char CHAR(1);

WHILE @index <= LEN(@input)
BEGIN
    SET @char = SUBSTRING(@input, @index, 1);
    IF @char LIKE '[a-z0-9()."`:;,#&+*\/-]' OR @char = '[' OR @char = ']'
        SET @temp += @char;
    SET @index += 1;
END;
PRINT @Temp;

edited Mar 07 '14 at 15:06

answered Mar 07 '14 at 14:30

Kevin Suchlicki

3,096
2
15
17

If I do this it will also allow @ sign, ~,!, test it and you will see. – Angel Cloudwalker Mar 07 '14 at 14:35
Yeah, Jarvis... I just tested it some more & my solution allows brackets to be kept, but it also leaves spaces... weird. – Kevin Suchlicki Mar 07 '14 at 14:36
@Jarvis I tried a totally different method, but seems to work now. What do you think? – Kevin Suchlicki Mar 07 '14 at 15:06
Can we put space in the pattern? or just put @char ='' in the like condition? – Angel Cloudwalker Mar 07 '14 at 15:26
I don't see why not, if you want to allow spaces... If this works for you, please mark it as answered. – Kevin Suchlicki Mar 07 '14 at 15:31

score 0 · Answer 2 · answered Mar 07 '14 at 16:28

-- First replaces special characters with similar safe ones.

It appears as though you want to replace open quote with regular quote, close quote with regular quote etc... There's a trick you can use to do this without having to mess around with a bunch of replaces. I don't pretend to know everything there is to know about collations, but I do know that they can do some pretty strange things you don't expect and sometimes you can use that to your advantage.

For example:

Declare @Temp VarChar(100)
SET @Temp = '”“‘’–'
Select @Temp,
        @Temp Collate SQL_Latin1_General_Cp850_CI_AS

When you run the code above, you will see that the data type is varchar. We set a string with open quote, close quote, etc... If you do not specify a collation, sql will return the data using the default database collation. If you do specify a collation, SQL will return the data using the collation.

As you can see, all of the special characters that you want to replace with "safe" special characters are in fact replaced.

Pulling this all together:

Alter Function [dbo].[RemoveSpecialCharacters](@Temp VarChar(1000))
Returns VarChar(1000) 
AS
BEGIN
    DECLARE @KeepValues as varchar(50)

    SET @Temp = @Temp Collate SQL_Latin1_General_Cp850_CI_AS
    SET @KeepValues = '%[^a-z0-9()."`'':;,#&+*\/-[[][]]]%'

    WHILE PatIndex(@KeepValues, @Temp) > 0
        SET @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    RETURN @Temp Collate Database_Default
END

Note that inside the function, the data is converted to a different collation, and the return value converts it back to the database's default collation.

I also put the open square bracket and close square bracked in to the KeepValues.

Using the replace function is not necessarily bad, but converting collations will execute faster and will product the same results. Scalar functions are notorious as performance killers, so it is in your best interest to make them as fast as possible (and you should be pleased with the performance of this function).

Now... I said earlier that I do not know everything there is to know about collations, so there may be some unexpected. Most of the characters will be the same, but there are some difference:

€    ?
    ?
‚    '
ƒ    ƒ
„    "
…    .
†    ┼
‡    ╬
ˆ    ^
‰    %
Š    S
‹    <
Œ    O
    ?
Ž    Z
    ?
    ?
‘    '
’    '
“    "
”    "
•    
–    -
—    -
˜    ~
™    T
š    s
›    >
œ    o
    ?
ž    z
Ÿ    Y

With the data shown above, you can interpret this as: the first character will be replaced with the second character when you convert the collation as I show in the code above.

SQL PATINDEX patterns for filtering

2 Answers2