1

Here is what I see in Notepad+++

I have a database that I'm working with that has captured some non-printable unicode characters (SS2 or Single-Shift 2). I have found a couple of functions here that are supposed to replace the bad characters with spaces but they don't seem to be working. Here is what I have that isn't working:

I'm using SQL server 2017

CREATE FUNCTION [dbo].[strip_non_printable_characters] (@S nvarchar(max))
Returns nvarchar(max)
Begin
    ;with  cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
           cte2(C) As (Select Top (32) Char(Row_Number() over (Order By (Select NULL))-1) From cte1 a,cte1 b)
    Select @S = Replace(@S,C,' ')
     From  cte2

    Return ltrim(rtrim(replace(replace(replace(@S,' ','†‡'),'‡†',''),'†‡',' ')))
End

and

CREATE FUNCTION [dbo].[strip_non_printable_characters] (@input_string nvarchar(max))
returns table with schemabinding as return (
  select


replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(
replace(replace(@input_string collate Latin1_General_BIN,
    char(1), ''),char(2), ''),char(3), ''),char(4), ''),char(5), ''),char(6), ''),char(7), ''),char(8), ''),char(9), ''),char(10), ''),
    char(11), ''),char(12), ''),char(13), ''),char(14), ''),char(15), ''),char(16), ''),char(17), ''),char(18), ''),char(19), ''),char(20), ''),
    char(21), ''),char(22), ''),char(23), ''),char(24), ''),char(25), ''),char(26), ''),char(27), ''),char(28), ''),char(29), ''),char(30), ''),
    char(31), ''), char(0) , '') 
 as clean_string
);
go

When running either of these functions, the string will still show the SS2 characters. If I return it as an NVARCHAR or as a varchar it just shows the bad character as a ?. I need to replace that bad character with a space.

Can someone tell me what I am doing wrong?

Thanks!!!

UPDATE

This function gave me what I needed.

CREATE FUNCTION [dbo].[strip_non_printable_characters] (@OldString as nVarChar(4000))
RETURNS varChar(2000) AS  
BEGIN 
DECLARE @Count integer, @NewString varChar(4000)
SET @Count =1
SET @NewString = ''
WHILE @Count <= Len(@OldString)
BEGIN
    --If the character is not a regular ascii character it will be removed
    IF  UNICODE(SUBSTRING(@OldString,@Count,1)) < 127
        BEGIN
            SET @NewString  = @NewString + SUBSTRING(@OldString,@Count,1)
        END
    SET @Count = @Count + 1
END
RETURN @NewString
END
Brook
  • 21
  • 9
  • What do you mean by "not working"? Also, as a side note, neither `†` or `‡` are unicode characters, that can be represented using ASCII (characters 134 and 135 respectively). if you simply want to replace those to characters though with a whitespace though, then `TRANSLATE(YourColumn,'†‡',' ')` (there are 2 white spaces there, but you can't display that in comments) will do the job fine. – Thom A Mar 20 '19 at 15:48
  • Post a reproducible example. It's quite possible that the strings *don't* contain the characters you try to replace. There are *thousands* of characters in Unicode after all. In any case, SQL isn't a good choice for string manipulation, especially when you want to handle Unicode groups and classes. It's better to write a SQLCLR function that takes a string and inspects its characters. It's even possible to write a regex that matches specific Unicode character classes and replaces them – Panagiotis Kanavos Mar 20 '19 at 15:49
  • Sorry, it's still returning the same bad characters. I wasn't sure exactly what those were doing, I found both those queries on here: https://stackoverflow.com/questions/43148767/sql-server-remove-all-non-printable-ascii-characters – Brook Mar 20 '19 at 15:51
  • This is giving me issues on an SSRS report that has a SQL base so I don't have any option but to manipulate it in SQL, unless you know a way to do this in SSRS – Brook Mar 20 '19 at 15:56
  • I just edited the original post to show what I am seeing in NotePad++ to see the unicode characters – Brook Mar 20 '19 at 16:00
  • As the author of the CTE approach. we replace the Control Characters with an ASCII code of 0 - 31 with a space and then remove remove repeating spaces. The †‡ is just a token in the final cleanup. Any RARE combinations can be used. – John Cappelletti Mar 20 '19 at 16:10

0 Answers0