0

I am using ascii conversion to remove all letters/ characters other than a-z, A-Z and 0-9 from a string. But when I am trying the same, I am getting accented letters. While giving CustomerName with 'āb' its converted to 'ab' But my expected result is just 'b'. As per my understanding accented values are taking ascii value same as their base. Please help if you have any idea. Attaching my code

CREATE FUNCTION dbo.RemoveSpecialChar (@s VARCHAR(256)) 
RETURNS VARCHAR(256)
WITH SCHEMABINDING
    BEGIN
        IF @s IS NULL
            RETURN NULL
        DECLARE @s2 VARCHAR(256) = '',
                @l INT = LEN(@s),
                @p INT = 1

        WHILE @p <= @l
            BEGIN
                DECLARE @c INT
                SET @c = ASCII(SUBSTRING(@s, @p, 1))
                IF @c BETWEEN 48 AND 57
                   OR  @c BETWEEN 65 AND 90
                   OR  @c BETWEEN 97 AND 122
                    SET @s2 = @s2 + CHAR(@c)
                SET @p = @p + 1
            END

        IF LEN(@s2) = 0
            RETURN NULL

        RETURN @s2
        end

SELECT  [dbo].[RemoveSpecialChar](CAST(Name AS nvarchar(255))) as CustomerName
from  Customers
Alias Varghese
  • 2,104
  • 3
  • 24
  • 52
  • Please show some example data for testing. – Dale K Feb 21 '19 at 06:08
  • @DaleBurrell While giving CustomerName with 'āb' its converted to 'ab' But my expected result is 'b' – Alias Varghese Feb 21 '19 at 06:13
  • You're passing in your string as `@s VARCHAR(256)` which doesn't support special characters so does an automatic conversion to `varchar`. – Dale K Feb 21 '19 at 06:40
  • `ASCII` won't work either, you'll need `UNICODE` – Dale K Feb 21 '19 at 06:57
  • @DaleBurrell his main purpose is remove characters that's not inside ASCII (only except a-z, A-Z, 0-9), so i don't see why he needs `UNICODE` here, can you explain more in an answer? – Jacky Feb 21 '19 at 07:01
  • 2
    Because when you use the `ASCII` function *ALL* characters are converted to `varchar` before the OP can test whether they should be removed or not, so `'ā'` will be converted to `'a'` and be kept when it should be discarded. As yet this isn't worthy of an answer - is basically a typo. – Dale K Feb 21 '19 at 07:05
  • 3
    The function accepts a VARCHAR. So the NVARCHAR `N'āb'` is probably already changed to 'ab' even before the ASCII check. Which should be the UNICODE function. F.e. `select N'ā' as a1, cast(N'ā' as varchar) a2, ascii(N'ā') a3, unicode(N'ā') a4` returns `ā a 97 257` – LukStorms Feb 21 '19 at 08:53

0 Answers0