0

I have some rows in my SQL table that contain single byte space characters, and some that contain double byte space characters. I need to identify rows with double byte characters to fix them in the user interface.

My first thought is to use charindex and substring to isolate the space from the rest of the text (SUBSTRING(@SomeField, charindex(' ', @SomeField),1)) and then use DataLength to see if it was double byte or single byte, but it appears that's not going to work. It appears it's giving the same value for both single-byte and double-byte spaces.

SELECT  DATALENGTH(' ') --double byte space
, DATALENGTH(' ') --single byte space
, LEN(' ') --double byte space
, LEN(' ') --single byte space
DanB
  • 2,022
  • 1
  • 12
  • 24
  • 2
    Just use LIKE on the column that may have the double space. E.g. SELECT * FROM YourTable WHERE DoubleSpaceColName LIKE '% %'; – John Sheridan Aug 30 '19 at 17:36
  • Unfortunately, that is still finding both single byte and double byte spaces. Both `CASE WHEN FieldWithDB LIKE ('% %') THEN 1 ELSE 0 END` and `CASE WHEN FieldWithDB LIKE ('% %') THEN 1 ELSE 0 end` return a value of 1. – Derick Stephensen Aug 30 '19 at 19:09
  • @DerickStephensen By "double-byte white space" do you mean something other than two ordinary consecutive spaces equivalent to ASCII character 32? – Andrew Morton Aug 30 '19 at 19:24
  • And you can always `cast(col as varbinary(200))` to examine exactly what code points are stored, and share the results here. – David Browne - Microsoft Aug 30 '19 at 19:38
  • @AndrewMorton Correct. I'm trying to search a name field that contains English and Japanese entries. English entries have the first and last name separated by a regular space (ascii 32). Some of the Japanese names are separated by a double byte white space character. Here's an example I found on another forum that might help to clarify: " " <--- this is a single-byte white space. " " <--- this is a double-byte white space. "日本 観光" <--- this is a search phrase which contains a single-byte white space. "日本 観光" <--- this is a search phrase which contains a double-byte white space. – Derick Stephensen Aug 30 '19 at 19:43
  • @DavidBrowne-Microsoft Forgive me, i'm not real familiar with varbinary and don't really know what i'm looking for. This is the output when I cast an example value as varbinary: 0x9F53307500306E6F – Derick Stephensen Aug 30 '19 at 19:58

1 Answers1

0

That character U+3000 or select cast(0x0030 as nchar(1)) is apparantly called "Ideographic Space".

You simply need to use unicode literals when using it in code. Both like and charindex work fine. But since it's semantically just a space, most collations will not distinguish between U+0020 (a regular space) and U+3000. So you also need to use a binary collation (or have your database or column set to use one). EG

select 1 where N'原田 潮' like N'% %' collate SQL_Latin1_General_CP437_BIN2

select charindex(N' ' collate SQL_Latin1_General_CP437_BIN2, N'原田 潮') 


select 1 where N'原田 潮' like N'% %' collate SQL_Latin1_General_CP437_BIN2

select charindex(N' ' collate SQL_Latin1_General_CP437_BIN2 , N'原田 潮')

outputs

-----------
1


-----------
3


-----------


-----------
0
David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67