3

I'm getting json file, which I load to Azure SQL databese. This json is direct output from API, so there is nothing I can do with it before loading to DB. In that file, all Polish diactircs are escaped to "C/C++/Java source code" (based on: http://www.fileformat.info/info/unicode/char/0142/index.htm

So for example:

 ł is \u0142

I was trying to find some method to convert (unescape) those to proper Polish letters.

In worse case scenario, I can write function which will replace all combinations

Repalce(Replace(Replace(string,'\u0142',N'ł'),'\u0144',N'ń')))

And so on, making one big, terrible function...

I was looking for some ready functions like there is for URLdecode, which was answered here on stack in many topics, and here: https://www.codeproject.com/Articles/1005508/URL-Decode-in-T-SQL

Using this solution would be possible but I cannot figure out cast/convert with proper collation and types in there, to get result I'm looking for. So if anyone knows/has function that would make conversion in string for unescaping that \u this would be great, but I will manage to write something on my own if I would get right conversion. For example I tried:

 select convert(nvarchar(1), convert(varbinary, 0x0142, 1))

I made assumption that changing \u to 0x will be the answer but it gives some Chinese characters. So this is wrong direction...

Edit: After googling more I found exactly same question here on stack from @Pasetchnik: Json escape unicode in SQL Server And it looks this would be the best solution that there is in MS SQL. Onlty thing I needed to change was using NVARCHAR instead of VARCHAR that is in linked solution:

CREATE FUNCTION dbo.Json_Unicode_Decode(@escapedString nVARCHAR(MAX))
RETURNS nVARCHAR(MAX) 
AS 
BEGIN
    DECLARE @pos INT = 0,
            @char nvarCHAR,
            @escapeLen TINYINT = 2,
            @hexDigits TINYINT = 4

    SET @pos = CHARINDEX('\u', @escapedString, @pos)

    WHILE @pos > 0
    BEGIN
        SET @char =  NCHAR(CONVERT(varbinary(8), '0x' + SUBSTRING(@escapedString, @pos + @escapeLen, @hexDigits), 1))
        SET @escapedString = STUFF(@escapedString, @pos, @escapeLen + @hexDigits, @char)
        SET @pos = CHARINDEX('\u', @escapedString, @pos)
    END

    RETURN @escapedString 
END
shimon893
  • 307
  • 2
  • 10

1 Answers1

1

Instead of nested REPLACE you could use:

DECLARE @string NVARCHAR(MAX)=  N'\u0142 \u0144\u0142';

SELECT @string = REPLACE(@string,u, ch)
FROM (VALUES ('\u0142',N'ł'),('\u0144', N'ń')) s(u, ch);

SELECT @string;

DBFiddle Demo

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • 1
    Thanks, this will be helpful! But still I hope some one will know if there is some option to convert ;) For now I will prepare function that will do the job this way – shimon893 Nov 25 '17 at 15:45