0

I am using Microsoft SQL Server 2014 and I am trying to update some of the columns in my table.

I want to replace every character of a string with another character.

For example, the word:

HELLO123

I want to replace H with T, E with Q, L with Y, O with I, 1 with 6, 2 with 7 and 3 with 8 and so for.

I'm not sure if Microsoft SQL Server 2014 supports regular expression, i.e. creating a function and looping through every character and replace will take a long time on a table with a couple of millions of rows.

Does anyone has any solution that works like regular expression and can be fast?

Thanks

Amira Bedhiafi
  • 8,088
  • 6
  • 24
  • 60
yoohoo
  • 1,147
  • 4
  • 22
  • 39
  • 2
    Why not take a peek at EncryptByPassPhrase() – John Cappelletti Dec 05 '19 at 15:27
  • 1
    I don't think there are any clever ways to do it. Only brut force. – Alexey Usharovski Dec 05 '19 at 15:27
  • As John implies, what is your actual goal. Simple character replacement is easily reversed. Maybe this discussion of [data obfuscation](https://www.red-gate.com/simple-talk/sql/database-administration/obfuscating-your-sql-server-data/) might be useful. – SMor Dec 05 '19 at 15:35
  • There is no particularly simple or elegant way to do this, which is why SQL Server 2017 adds [`TRANSLATE`](https://learn.microsoft.com/sql/t-sql/functions/translate-transact-sql?view=sql-server-2017). In 2014 you're looking at things like inline table-valued functions that use recursion, and even that will be none too fast when applied to millions of rows. A CLR function is another possible alternative, though that has a learning and deployment curve. – Jeroen Mostert Dec 05 '19 at 15:51

3 Answers3

0

If you haven't noticed, the problem with REPLACE is that you'll need to nest the values, however, because you're nesting something like REPLACE(REPLACE('HELLO','H','E'),'E','Q') would return in 'QQLLO' not 'EQLLO'. As mentioned in the comments, SQL Server 2017 introduced TRANSLATE, which will only process a character once, however, as you're using 2014, you can';t use it (TRANSLATE('HELLO','HE','EQ')).

What you could do, is create a lookup table, and then split out the data into characters and rebuild it. This isn't going to be fast with a lot of data, and no, it's not going to get faster; but it'll "do the job":

--Create a table for the Cipher characters
CREATE TABLE dbo.CharCipher (InputChar char(1) NOT NULL,
                             OutputChar char(1) NOT NULL);
GO

--Add a Clustered Primary Key
ALTER TABLE dbo.CharCipher ADD CONSTRAINT PK_CharCipher PRIMARY KEY CLUSTERED (InputChar);
GO

--Ensure that the Output character us unique too    
CREATE UNIQUE NONCLUSTERED INDEX UX_CipherOutput ON dbo.CharCipher (OutputChar);
GO

--Add your Ciphers
INSERT INTO  dbo.CharCipher (InputChar,
                             OutputChar)
VALUES ('H','T'),
       ('E','Q'),
       ('L','Y'),
       ('O','I'),
       ('1','6'),
       ('2','7'),
       ('3','8');
GO

--Create a Sample table
CREATE TABLE dbo.YourTable (YourString varchar(15));
INSERT INTO dbo.YourTable (YourString)
VALUES('HELLO123');
GO

--And now the "Mess"... I mean solution
WITH N AS(
    SELECT N
    FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
    SELECT TOP (8000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
    FROM N N1, N N2, N N3, N N4)
SELECT YT.YourString,
       (SELECT ISNULL(CC.OutputChar,V.YourChar)
        FROM Tally T
             CROSS APPLY (VALUES(CONVERT(char(1),SUBSTRING(YT.YourString,T.I,1))))V(YourChar)
             LEFT JOIN dbo.CharCipher CC ON V.YourChar = CC.InputChar
        WHERE T.I <= LEN(YT.YourString)
        ORDER BY T.I
        FOR XML PATH(''),TYPE).value('.','varchar(8000)') AS NewString
FROM dbo.YourTable YT;

GO

--Clean up
DROP TABLE dbo.YourTable;
DROP TABLE dbo.CharCipher;
Thom A
  • 88,727
  • 11
  • 45
  • 75
  • hi Lamu, thanks for your help. can you explain the query briefly. for example, what does N query do in the wIth clause? you created 10 null, why 10 and not more or less? what the tally query does? i see top 8000. why 8000 and not more or less and you join 4 times with N, why 4 times? please provide some explanation. thanks in advance – yoohoo Dec 05 '19 at 18:44
  • There's no `m` in my alias, @yoohoo . [The "Numbers" or "Tally" Table: What it is and how it replaces a loop](https://www.sqlservercentral.com/articles/the-numbers-or-tally-table-what-it-is-and-how-it-replaces-a-loop-1). 8,000 because that the most a `varchar` can hold (before using `MAX`, which'll cause a performance loss). – Thom A Dec 05 '19 at 19:18
0
--this could work if the strings are all uppercase --<-- not true
--you could use nchar & foreign characters and handle everything correctly...it is just a pain to type...
declare @s varchar(20) = 'HELLO123';

--lower case everything
select @s = lower(@s)

--handle numbers with non printable characters  --> number to char() --> char() to new number
select @s = replace(replace(replace(replace(replace(replace(@s, '1', char(1)), '2', char(2)), '3', char(3)), char(1), '6'), char(2), '7'), char(3), '8')


--handle letters with case sensitive replacement (using a CS collation)
--(all letters are lowercased)lowercase letter --> new uppercase letter
select replace(replace(replace(replace(@s collate SQL_Latin1_General_CP1_CS_AS, 'h' , 'T'), 'e', 'Q'), 'l', 'Y'),  'o', 'I'); 



SELECT UPPER(REPLACE(REPLACE(LOWER('HELLO') collate SQL_Latin1_General_CP1_CS_AS, 'h','E'),'e','Q'));


GO

/*
SELECT dbo.shiftchars('0123456789::ABCDEFGHIJKLMNOPQRSTUVWXYZ::abcdefghijklmnopqrstuvwxyz');

just for fun *using a single replace()*
not for millions of rows
suitable for standard latin alphanumeric
extended ascii & non printable chars are not handled correctly.
*/

CREATE FUNCTION dbo.shiftchars(@s VARCHAR(8000))
RETURNS VARCHAR(8000)
WITH SCHEMABINDING, RETURNS NULL ON NULL INPUT
AS
BEGIN
        SELECT @s = REPLACE(@s, f collate SQL_Latin1_General_CP1_CS_AS , t)
        FROM
        (
        VALUES
            (1, '1', CHAR(1)),
            (1, '2', CHAR(2)),
            (1, '3', CHAR(3)),
            (1, '4', CHAR(4)),
            (1, '5', CHAR(5)),
            (1, '6', CHAR(6)),
            (1, '7', CHAR(7)),
            (1, '8', CHAR(8)),
            (1, '9', CHAR(9)),
            (1, '0', CHAR(254)),
            (1, 'A', CHAR(128)),
            (1, 'B', CHAR(129)),
            (1, 'C', CHAR(130)),
            (1, 'D', CHAR(131)),
            (1, 'E', CHAR(132)),
            (1, 'F', CHAR(133)),
            (1, 'G', CHAR(134)),
            (1, 'H', CHAR(135)),
            (1, 'I', CHAR(136)),
            (1, 'J', CHAR(137)),
            (1, 'K', CHAR(138)),
            (1, 'L', CHAR(139)),        
            (1, 'M', CHAR(140)),
            (1, 'N', CHAR(141)),
            (1, 'O', CHAR(142)),        
            (1, 'P', CHAR(143)),
            (1, 'Q', CHAR(144)),
            (1, 'R', CHAR(145)),        
            (1, 'S', CHAR(146)),
            (1, 'T', CHAR(147)),    
            (1, 'U', CHAR(148)),        
            (1, 'V', CHAR(149)),
            (1, 'W', CHAR(150)),
            (1, 'X', CHAR(151)),        
            (1, 'Y', CHAR(152)),
            (1, 'Z', CHAR(153)),
            (1, 'a', CHAR(154)),
            (1, 'b', CHAR(155)),
            (1, 'c', CHAR(156)),
            (1, 'd', CHAR(157)),
            (1, 'e', CHAR(158)),
            (1, 'f', CHAR(159)),
            (1, 'g', CHAR(160)),
            (1, 'h', CHAR(161)),
            (1, 'i', CHAR(162)),
            (1, 'j', CHAR(163)),
            (1, 'k', CHAR(164)),
            (1, 'l', CHAR(165)),        
            (1, 'm', CHAR(166)),
            (1, 'n', CHAR(167)),
            (1, 'o', CHAR(168)),        
            (1, 'p', CHAR(169)),
            (1, 'q', CHAR(170)),
            (1, 'r', CHAR(171)),        
            (1, 's', CHAR(172)),
            (1, 't', CHAR(173)),    
            (1, 'u', CHAR(174)),        
            (1, 'v', CHAR(175)),
            (1, 'w', CHAR(176)),
            (1, 'x', CHAR(177)),        
            (1, 'y', CHAR(178)),
            (1, 'z', CHAR(179)),
            --------------------
            (2, CHAR(1), '6'),
            (2, CHAR(2), '7'),
            (2, CHAR(3), '8'),
            (2, CHAR(4), '9'),
            (2, CHAR(5), '0'),
            (2, CHAR(6), '1'),
            (2, CHAR(7), '2'),
            (2, CHAR(8), '3'),
            (2, CHAR(9), '4'),
            (2, CHAR(254), '5'),
            (2, CHAR(128), 'M'),
            (2, CHAR(129), 'N'),
            (2, CHAR(130), 'O'),
            (2, CHAR(131), 'P'),
            (2, CHAR(132), 'Q'),
            (2, CHAR(133), 'R'),
            (2, CHAR(134), 'S'),
            (2, CHAR(135), 'T'),
            (2, CHAR(136), 'U'),
            (2, CHAR(137), 'V'),
            (2, CHAR(138), 'W'),
            (2, CHAR(139), 'X'),            
            (2, CHAR(140), 'Y'),
            (2, CHAR(141), 'Z'),
            (2, CHAR(142), 'A'),            
            (2, CHAR(143), 'B'),
            (2, CHAR(144), 'C'),
            (2, CHAR(145), 'D'),            
            (2, CHAR(146), 'E'),
            (2, CHAR(147), 'F'),        
            (2, CHAR(148), 'G'),            
            (2, CHAR(149), 'H'),
            (2, CHAR(150), 'I'),
            (2, CHAR(151), 'J'),            
            (2, CHAR(152), 'K'),
            (2, CHAR(153), 'L'),
            (2, CHAR(154), 'm'),
            (2, CHAR(155), 'n'),
            (2, CHAR(156), 'o'),
            (2, CHAR(157), 'p'),
            (2, CHAR(158), 'q'),
            (2, CHAR(159), 'r'),
            (2, CHAR(160), 's'),
            (2, CHAR(161), 't'),
            (2, CHAR(162), 'u'),
            (2, CHAR(163), 'v'),
            (2, CHAR(164), 'w'),
            (2, CHAR(165), 'x'),            
            (2, CHAR(166), 'y'),
            (2, CHAR(167), 'z'),
            (2, CHAR(168), 'a'),            
            (2, CHAR(169), 'b'),
            (2, CHAR(170), 'c'),
            (2, CHAR(171), 'd'),            
            (2, CHAR(172), 'e'),
            (2, CHAR(173), 'f'),        
            (2, CHAR(174), 'g'),            
            (2, CHAR(175), 'h'),
            (2, CHAR(176), 'i'),
            (2, CHAR(177), 'j'),            
            (2, CHAR(178), 'k'),
            (2, CHAR(179), 'l')                 
        ) AS v(o, f, t)
        ORDER BY o;

    RETURN (@s);

END
lptr
  • 1
  • 2
  • 6
  • 16
0

It is possible to do such scramble on Sql Server 2014.
Even without a UDF or a CLR.

Here's a method that uses an OUTER APPLY on a FOR XML to unfold & replace the characters for the [0-9A-Za-z] ranges.

Sample data:

create table test 
(
  id int identity(1,1) primary key,
  col nvarchar(42)
);

insert into test (col) values
(N'HELLO 0123'),
(N'01234π56789'),
(N'abcdefghijklm>nopqrstuvwxyz'),
(N'ABCDEFGHIJKLM✓NOPQRSTUVWXYZ');

Num num:

--
-- Temporary tally table with numbers
-- Will be used to unfold that characters
--
if object_id('tempdb..#nums') is not null
  drop table #nums;

create table #nums (n int primary key);

with rcte as
(
 select 1 n, max(len(col)) max_n
 from test
 union all 
 select n+1, max_n
 from rcte 
 where n <= max_n
)
insert #nums (n)
select n 
from rcte
option (maxrecursion 4000);

Query:

select t.*, a.scramble
from test t
outer apply
(
  select q.x.value('.','NVARCHAR(MAX)') as scramble
  from
  (
    select
     case 
     when substring(col,n,1) between N'0' and N'9'
     then substring(
      N'5678901234',charindex(substring(col,n,1),
      N'0123456789'),1)
     when unicode(substring(col,n,1)) between unicode(N'a') and unicode(N'z')
     then substring(
      N'nomrqputswvyzxiacbedghfjlk',charindex(substring(col,n,1),
      N'abcdefghijklmnopqrstuvwxyz'),1)
     when unicode(substring(col,n,1)) between unicode(N'A') and unicode(N'Z')
     then substring(
      N'NOMRQPUTSWVYZXIACBEDGHFJLK',charindex(substring(col,n,1),
      N'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),1)
     else substring(col,n,1)
     end [text()]
    from #nums
    where n between 1 and len(col)
    order by n
    for xml path (''), type
  ) q(x)
  where q.x is not null
) a;

Result:

id | col                           | scramble                     
-: | :---------------------------- | :----------------------------
 1 | HELLO 0123                    | TQYYI 5678                   
 2 | 01234π56789                  | 56789π01234                 
 3 | abcdefghijklm>nopqrstuvwxyz   | nomrqputswvyz>xiacbedghfjlk  
 4 | ABCDEFGHIJKLM✓NOPQRSTUVWXYZ | NOMRQPUTSWVYZ✓XIACBEDGHFJLK

A test on db<>fiddle here

--

A solution that's more specific for VARCHAR:

select t.*, a.scramble
from test t
outer apply
(
  select q.x.value('.', 'VARCHAR(MAX)') as scramble
  from
  (
    select
     case 
     when substring(col,n,1) between '0' and '9'
     then substring(
      '5678901234',charindex(substring(col,n,1),
      '0123456789'),1)
     when ascii(substring(col,n,1)) between ascii('a') and ascii('z')
     then substring(
      'nomrqputswvyzxiacbedghfjlk',charindex(substring(col,n,1),
      'abcdefghijklmnopqrstuvwxyz'),1)
     when ascii(substring(col,n,1)) between ascii('A') and ascii('Z')
     then substring(
      'NOMRQPUTSWVYZXIACBEDGHFJLK',charindex(substring(col,n,1),
      'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),1)
     else substring(col,n,1)
     end
    from #nums
    where n between 1 and len(col)
    order by n
    for xml path (''), type
  ) q(x)
  where q.x is not null
) a;

Or, a solution that does a rotation scramble:

select t.*, a.scramble
from test t
outer apply
(
  select q.x.value('.', 'VARCHAR(MAX)') as scramble
  from
  (
    select 
    case
    when substring(col,n,1) between '0' and '9'
    then char(ascii('0')+(ascii(substring(col,n,1))-ascii('0')+5)%10)
    when ascii(substring(col,n,1)) between ascii('a') and ascii('z')
    then char(ascii('a')+(ascii(substring(col,n,1))-ascii('a')+13)%26)
    when ascii(substring(col,n,1)) between ascii('A') and ascii('Z')
    then char(ascii('A')+(ascii(substring(col,n,1))-ascii('A')+13)%26)
    else substring(col,n,1)
    end
    from #nums
    where n between 1 and len(col)
    order by n
    for xml path (''), type
  ) q(x)
) a

A test on db<>fiddle here

Note that for a Sql Server 2017+ solution, a STRING_SPLIT could replace the FOR XML. But then again, one could simply use TRANSLATE then.

Example:

UPDATE test
SET col = TRANSLATE(col, 
           '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' collate Latin1_General_CS_AS, 
           '5678901234nomrqputswvyzxiacbedghfjlkNOMRQPUTSWVYZXIACBEDGHFJLK');
LukStorms
  • 28,916
  • 5
  • 31
  • 45
  • Hi LukStorms, thanks for your help. i am trying to understand the purpose of #nums table. can you explain? if i have to apply the same logic to another column in my data table then do i need to create another table like #nums? – yoohoo Dec 06 '19 at 20:52
  • @yoohoo It's just a simple temporary tally table with numbers. It's used for the substring to get the n'th character. In fact, suppose you know that the strings won't have a length over N characters, then you could create a permanent table with only numbers. The insert I used, limits it to the max length used in the column, but you can simply tweak so it loads f.e. 1000 numbers in it. – LukStorms Dec 06 '19 at 21:00
  • Btw, if you're gonna replace multiple columns on the same table? Then maybe it's worth the effort to write a CLR that does something like that TRANSLATE function. F.e. [this SO post](https://stackoverflow.com/q/25136636/4003419). – LukStorms Dec 06 '19 at 21:23
  • @yoohoo Fyi, the answer was updated. A fix for the special XML characters. – LukStorms Dec 13 '19 at 21:11