1

I have a SQL column - Comments - which holds nvarchar data.

Along with the usual English characters, I can see that there are values that look like special characters, following is an example - I cannot make sense of how they got there and what they mean:

enter image description here

I tried to filter all values that contain that character using the following SQL but it returns 2010 as well as the one where there is a special character in place of 2.

where comments like N'%ဲ010%'
Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
variable
  • 8,262
  • 9
  • 95
  • 215
  • Please tag with the actual SQL database you are using. Then, give us some sample input data in the form a table. – Tim Biegeleisen Feb 12 '20 at 05:03
  • `like N'%ဲ010%'` this means if the value contains the given pattern of string – Jaydip Jadhav Feb 12 '20 at 05:06
  • Yes why would it return 2010? That special character is same as 2? – variable Feb 12 '20 at 05:11
  • verify your special character and try to get ASCII code of it, it is correct or not. – S_Sky Feb 12 '20 at 05:48
  • How to verify and get the ASCII code for it? – variable Feb 12 '20 at 06:37
  • 1
    what collation are you using? probably the same as this https://dba.stackexchange.com/questions/171331/why-does-searching-for-like-n-match-any-unicode-character-and-n-match or this https://stackoverflow.com/a/47551803/73226 – Martin Smith Feb 12 '20 at 07:08
  • By database is using: Latin1_General_CI_AI – variable Feb 12 '20 at 07:15
  • If you do `like N'%ဲ010%' COLLATE Latin1_General_100_CI_AI ` this should resolve it. Presumably [MYANMAR VOWEL SIGN AI](https://www.fileformat.info/info/unicode/char/1032/index.htm) wasn't weighted in that earlier collation – Martin Smith Feb 12 '20 at 08:49
  • How did that character arrive in the database? And how do I handle such characters? – variable Feb 12 '20 at 09:36
  • We can't tell you how they got into your database. Maybe the user entered them or maybe you have a bug in code that you haven't shown us. Regarding how to "handle" them. Depends what you mean by handling them. Applying a `COLLATE` clause as above to queries trying to specifically work with that character will probably be a part of it – Martin Smith Feb 13 '20 at 14:49
  • 1
    variable: @MartinSmith is correct regarding that character U+1032 doesn't have a sort weight defined in the version 80 collation that you're using (i.e. `Latin1_General_CI_AI`). Even testing with the next version of collations, `Japanese_90_CI_AI`, shows that it has no sort weight: `SELECT 1 WHERE NCHAR(0x1032) = NCHAR(0x0000) COLLATE Japanese_90_CI_AS; SELECT 2 WHERE NCHAR(0x1032) = N'' COLLATE Japanese_90_CI_AS;`. No sort weight means it's completely ignored. And yes, starting in the version 100 collations it (and many others that were previously missing) does have a sort weight. – Solomon Rutzky Feb 13 '20 at 18:34
  • variable: also, I have an idea on how the data got there, but I need to see the bytes of the entire string that contains the "special" character. Please update the question with the output of the following query, assuming it's not too long: `SELECT TOP (2) CONVERT(VARBINARY(MAX), [comments]) FROM dbo.{TableName} WHERE [comments] LIKE N'%' + NCHAR(0x1032) + N'%' COLLATE Latin1_General_100_BIN2;` – Solomon Rutzky Feb 13 '20 at 18:35

0 Answers0