2

For characters that are do not have an explicit order that depends on the collation (i.e., that are not case-folded and/or do not have accents stripped), the ordering is based on the ordinal value of the code point in the base character set. If one string is a prefix of another, the shorter string collates first.

The first 128 code points in Unicode are the same as the ASCII character set, which means / comes after -. ---- From cco's answer.

Then why would the following happen?

Demo: How '2017/8/22 1:33:53' precedes '2017-08-13 23:12:33.411'

I expect '2017/8/22 1:33:53' to come after '2017-08-13 23:12:33.411' because / comes after -. Is this because special ordering in Chinese_PRC_CI_AS collation? Where can I find the specification if that's the case?

Community
  • 1
  • 1
Ryan
  • 1,040
  • 3
  • 16
  • 25
  • The last thing you want to do is compare datetimes wrapped in strings. Cast or convert those strings to the `DATETIME` type before comparing. Better yet, don't store datetimes as strings. – TT. Aug 22 '17 at 06:42
  • @TT. Thanks for the reminder! I certainly won't do that if the database was made by me... But anyway, any idea on the question (specifically why such a weird comparison result happens)? Thanks! – Ryan Aug 22 '17 at 07:09
  • Length is definitely taken into account, that is not dependant on the collation. Execute the following: `DECLARE @t TABLE (v NVARCHAR(3)); INSERT INTO @t(v)VALUES('1'),('01'),('10'),('2'),('3'),('300'),('301'); SELECT*FROM @t ORDER BY v;`. You'll see _10_ coming before _2_. You can't compare datetimes in strings if the length of those strings can vary, or if other characters are used as separators. – TT. Aug 22 '17 at 07:20
  • 1
    Please view this question as why the string '2017/8/22 1:33:53' precedes '2017-08-13 23:12:33.411'. Never mind it is actually datetime. When we are comparing strings the former should come AFTER the latter, while it is not in the pic above. – Ryan Aug 22 '17 at 07:36

3 Answers3

3

I expect '2017/8/22 1:33:53' to come after '2017-08-13 23:12:33.411' because / comes after -. Is this because special ordering in Chinese_PRC_CI_AS collation?

Yes it's because of Chinese_PRC_CI_AS collation

Where can I find the specification if that's the case?

The rules are complicated enough but you can find them here: UNICODE COLLATION ALGORITHM

I extract some text from it as a picture to give you an idea: enter image description here

In your case you can try to use a binary collation Chinese_PRC_BIN2, it will give you the desirable result in this particular case, but it can be non acceptable for letters sorting, I mean in european languages binary sort will always put Upper case letters before all the lower case letters, but I have no idea how Chinese symbols should be sorted

Here is the code for sort your dates with your order (you said you use UNICODE columns):

declare @t table (s nvarchar(100))
insert into @t values (N'2017/8/22 1:33:53'),  (N'2017-08-13 23:12:33.411')

select *
from @t
order by s collate Chinese_PRC_BIN2;

Bin2 collation will work in "non unicode case" as your example in the picture, too:

select case 
      when '2017/8/22 1:33:53' collate Chinese_PRC_BIN2 < 
           '2017-08-13 23:12:33.411' collate Chinese_PRC_BIN2 
      then 'TRUE' 
      else 'FALSE' 
   end;

enter image description here

sepupic
  • 8,409
  • 1
  • 9
  • 20
1

It is not only about ASCII codes.

Yes, ASCII code of / is 0x2F and ASCII code of - is 0x2D, but string comparison rules depend on collation and these rules can be quite complex, taking into account not only values of the character code.

The - symbol is treated in a special way in some collations.

Here is a full example:

DECLARE @T1 TABLE (Value varchar(100) COLLATE Chinese_PRC_CI_AS);
DECLARE @T2 TABLE (Value varchar(100) COLLATE Latin1_General_CI_AS);
DECLARE @T3 TABLE (Value varchar(100) COLLATE SQL_Latin1_General_CP1_CI_AS);

INSERT INTO @T1 VALUES
('abc'),
('abc-def'),
('abcdef'),
('abc-');

INSERT INTO @T2 VALUES
('abc'),
('abc-def'),
('abcdef'),
('abc-');

INSERT INTO @T3 VALUES
('abc'),
('abc-def'),
('abcdef'),
('abc-');

SELECT * FROM @T1 ORDER BY Value;
SELECT * FROM @T2 ORDER BY Value;
SELECT * FROM @T3 ORDER BY Value;

Result

T1 (Chinese_PRC_CI_AS)

+---------+
|  Value  |
+---------+
| abc     |
| abc-    |
| abcdef  |
| abc-def |
+---------+

T2 (Latin1_General_CI_AS)

+---------+
|  Value  |
+---------+
| abc     |
| abc-    |
| abcdef  |
| abc-def |
+---------+

T3 (SQL_Latin1_General_CP1_CI_AS)

+---------+
|  Value  |
+---------+
| abc     |
| abc-    |
| abc-def |
| abcdef  |
+---------+

Note, that SQL collation in the third table produced result in a different order.


Note also, that if you change the column type to nvarchar, the effect disappears in this specific example. In other words, rules for treating the - symbol depend on collation and on the type of the value.

Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
  • In my case the column and data type is of nvarchar. And the 'issue' (kind of issue for me, but may be by design from MS's perspective) persists after explicitly converting the string to nvarchar. – Ryan Aug 22 '17 at 08:51
  • Maybe also due to the "complex rules" you mentioned? Where can I find those rules? – Ryan Aug 22 '17 at 08:52
  • Sorry, Ryan. I don't know where to find detailed description of sorting rules for a collation. Try to ask another specific question. Maybe somebody would answer. – Vladimir Baranov Aug 22 '17 at 09:12
  • Thanks @Vladimir your answer is still very informative! – Ryan Aug 22 '17 at 13:42
0

For characters that are do not have an explicit order that depends on the collation (i.e., that are not case-folded and/or do not have accents stripped), the ordering is based on the ordinal value of the code point in the base character set. If one string is a prefix of another, the shorter string collates first.
The first 128 code points in Unicode are the same as the ASCII character set, which means / comes after -.

cco
  • 5,873
  • 1
  • 16
  • 21
  • Then why August 22 comes before august 13(in the previous example), as is tested on sql server 2016 and 2008? – Ryan Aug 22 '17 at 04:26
  • I edited my question to be clearer. Please see the updated version. Thanks! – Ryan Aug 22 '17 at 07:55
  • @cco, Windows and SQL collations may treat `-` symbol differently for string ordering purposes, see example in my answer. – Vladimir Baranov Aug 22 '17 at 08:35