0

I have nvarchar(4000) field containing data like this:

D0B6D181D0B5D0B4D0BA35D0BC (cyrillic string)
E59EA0E78999E79B98E99499 (chinese string)
...

Each character is presented by two bytes in data sequence. How to convert this data to string using T-SQL?

Anthony Faull
  • 17,549
  • 5
  • 55
  • 73
xieergai
  • 165
  • 2
  • 6
  • possible duplicate of [How to decode nvarchar to text (SQL Server 2008 R2)?](http://stackoverflow.com/questions/6263500/how-to-decode-nvarchar-to-text-sql-server-2008-r2) – Ed Harper Jun 07 '11 at 13:43

2 Answers2

1

Your first example is UTF8-encoded Cyrillic text that's been converted to a hexadecimal string and stored in an nvarchar(4000) field in SQL Server. That's quite an odd combination. Strangely, SQL Server has no native support for converting UTF8 to nvarchar in TSQL. You can roll your own DecodeUTF8 function, or you can use mine below.

Your examples:

select
  Cyrillic = dbo.DecodeUTF8(convert(varbinary(max), '0x'+ 'D0B6D181D0B5D0B4D0BA35D0BC', 1))
  , Chinese = dbo.DecodeUTF8(convert(varbinary(max), '0x'+ 'E59EA0E78999E79B98E99499', 1))

Output:

Cyrillic  Chinese 
жседк5м   垠牙盘错

My UTF8 decoder for TSQL:

create function [dbo].[DecodeUTF8](@utf8 varchar(max)) returns nvarchar(max)
as
begin
    declare @xml xml;

    with e2(n) as (select top(16) 0 from (values(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e(n))
    , e3(n) as (select top(256) 0 from e2, e2 e)
    , e4(n) as (select top(65536) 0 from e3, e3 e)
    , e5(n) as (select top(power(2.,31)-1) row_number() over (order by(select 0)) from e4, e4 e)
    , numbers(i) as (select top(datalength(@utf8)) row_number() over (order by(select 0)) from e5)
    , x as (
        select *
        from numbers
        cross apply (select byte = convert(tinyint, convert(binary(1), substring(@utf8, i, 1)))) c
        cross apply (select n = floor(log(~(byte) * 2 + 1, 2)) - 1) d
        cross apply (select bytes = case when n in (5,4,3) then 7 - n else 1 end) e
        cross apply (select data = byte % power(2, n)) f
    )
    select @xml =
    (
        select nchar(case x.bytes
            when 1 then x.data
            when 2 then power(2, 6) * x.data + x2.data 
            when 3 then power(2, 6*2) * x.data + power(2, 6) * x2.data + x3.data
            when 4 then power(2, 6*3) * x.data + power(2, 6*2) * x2.data + power(2, 6) * x3.data + x4.data
          end)
        from x
        left join x x2 on x2.i = x.i + 1 and x.bytes > 1
        left join x x3 on x3.i = x.i + 2 and x.bytes > 2
        left join x x4 on x4.i = x.i + 3 and x.bytes > 3
        where x.n <> 6
        order by x.i
        for xml path('')
    );

    return @xml.value('.', 'nvarchar(max)');
end
Anthony Faull
  • 17,549
  • 5
  • 55
  • 73
0

Take a look at this

http://devio.wordpress.com/2009/07/11/convert-unicode-hex-codepoint-to-unicode-character-in-sql-server/

It looks like they are doing single byte codepoints, so you may need to modify it a bit

dfb
  • 13,133
  • 2
  • 31
  • 52