3

I cam across this use case and I am puzzled by it :

const naturalCollator = new Intl.Collator(undefined, {
  numeric: true,
  sensitivity: 'base'
});
const comparator = (a, b) => naturalCollator.compare(a, b);

const numbers = [-1, 0, 1, 10, NaN, 2, -0.001, NaN, 0, -1, -Infinity, NaN, 5, -10, Infinity, 0];

console.log(numbers.sort(comparator));

The result array list negative numbers in descending order, while positive in ascending order. For example :

[-3, 1, -2, 2].sort(comparator)
// [-2, -3, 1, 2]

Since Intl.Collator is a "language-sensitive string comparison", does it simply ignore the sign and only evaluates every number as positive?

Edit

Another inconsistency is this one:

["b1", "a-1", "b-1", "a+1", "a1"].sort(comparator);
// ['a-1', 'a+1', 'a1', 'b-1', 'b1']

Where 'a' < 'b' so the order is OK, but '-' > '+' so why is "a-1" before "a+1"?

In other words, a negative sign is considered less than a positive sign regardless of it's character code, however "-1" is considered less than "-2", ignoring the sign.

Yanick Rochon
  • 51,409
  • 25
  • 133
  • 214
  • 2
    Using a string-comparer to sort `number`-typed values is kinda pointless, no? – Dai Oct 03 '22 at 18:06
  • @Dai it is by doing stupid things that people find flaws in code. :) I'm just trying to understand the reason why the API behaves like this in this case. If it's a flaw, then you are welcome! If there is a reason for it, then I will have learned something. – Yanick Rochon Oct 03 '22 at 18:08
  • Obviously it doesn't ignore the sign, or else `-1` and `1` would be sorted together. It might sort by sign first, then by absolute value of the number. In any case, using `undefined` as the "locale" means you're using the "locale" set for your OS. That might be the same as mine (en-US), it might not, but if you log a bug to a browser or JavaScript engine, make sure you specify one. – Heretic Monkey Oct 03 '22 at 18:15
  • 1
    To really get an answer on this, we'd need to know what JavaScript implementation you're using, as the collation is implementation-dependent; the [ECMA-402 Specification](https://tc39.es/ecma402/#collator-objects) doesn't say what specific locales should do in terms of sorting. – Heretic Monkey Oct 03 '22 at 18:21
  • From what I can tell, using `numeric: true` only works with strings representing zero or positive values, not negative numbers, whereby it reverts to a lexicographical ordering. The fact the input is `number[]` instead of `string[]` is a red-herring as the output is identical for both `number[]` and `number[].map( n => n.toString() )` (i.e. `string[]`). – Dai Oct 03 '22 at 18:23

1 Answers1

3

The default string sorting algorithm uses the unicode values for each code unit in the strings being compared. This is called "lexicographic sort".

When you set the collator options, you are defining specific overrides to this behavior (you can think of them as higher-priority rules above lexicographic sort).

Here's a link to the relevant spec section: https://tc39.es/ecma402/#sec-collator-comparestrings

When comparing number values (like in your example), the first step is for the numbers to be coerced to strings before they are used in the internal sort function.

When using the numeric option, the effect is only applied to code units which are classified as numbers.

In the case of your stringified negative values, the hyphens are evaluated as non-numeric characters. Then the contiguous sequences of digits are evaluated as number-like groups.

You can see the effect of this when sorting other strings which begin with hyphens alongside the numbers:

const opts = { numeric: true, sensitivity: 'base' };
const naturalCollator = new Intl.Collator(undefined, opts);

const values = [-3, 1, -2, 2, '-foo', '-bar', 'foo', 'bar'];

console.log(values.sort(naturalCollator.compare));
//=> [-2, -3, "-bar", "-foo", 1, 2, "bar", "foo"]

Another example of where the numeric option is useful: Consider a series of filenames with numeric substrings intended for grouped ordering:

const opts = { numeric: true, sensitivity: 'base' };
const naturalCollator = new Intl.Collator(undefined, opts);

const fileNames = [
  'IMG_1.jpg',
  'IMG_2.jpg',
  'IMG_3.jpg',
  // ...
  'IMG_100.jpg',
  'IMG_101.jpg',
  'IMG_102.jpg',
  // ...
  'IMG_200.jpg',
  'IMG_201.jpg',
  'IMG_202.jpg',
  // etc...
];

fileNames.sort();
console.log(fileNames); // 
//=> ["IMG_1.jpg", "IMG_100.jpg", "IMG_101.jpg", "IMG_102.jpg", "IMG_2.jpg", "IMG_200.jpg", "IMG_201.jpg", "IMG_202.jpg", "IMG_3.jpg"]

fileNames.sort(naturalCollator.compare);
console.log(fileNames); // 
//=> ["IMG_1.jpg", "IMG_2.jpg", "IMG_3.jpg", "IMG_100.jpg", "IMG_101.jpg", "IMG_102.jpg", "IMG_200.jpg", "IMG_201.jpg", "IMG_202.jpg"]
jsejcksn
  • 27,667
  • 4
  • 38
  • 62
  • Thank you. So, I presume that the Unicode `-` is small than `+`, because the ASCII values are the other way around. So, in other words, even when evaluating numeric values using the collator, the positive or negative signs are ignored. There should be an option to consider the numeric sign then, as it appears that this option does not exist. – Yanick Rochon Oct 03 '22 at 18:38
  • [^](https://stackoverflow.com/questions/73939121/why-does-intl-collator-sort-negative-numbers-in-descending-order/73939421#comment130554492_73939421) @YanickRochon No, that's not the case. The codepoint for `-` is `45` and the codepoint for `+` is `43`. I think that ordering has to do with the locale, but I'm not certain. In the collator options you provided (also used in my answer), no locale is defined, so the default locale of the JS runtime is used. For me, that was `en-US` at the time I wrote the answer. – jsejcksn Oct 03 '22 at 18:53
  • In general, if you need to compare different values of different data types (e.g. numbers and strings), you should probably use your own comparator function (which can certainly use a collator instance internally for strings or specific matching strings). If you are only comparing numbers, then coercing them to strings seems unnecessary. – jsejcksn Oct 03 '22 at 18:59
  • I would normally write my own functions, but I throught about JavaScript `Intl` API and was wondering if it would be better since it does consider the current locale, etc. The use case involving `-` and `+` is what perplexes me the most. I find quite strange that there should no option to consider the positive or negative signs of numbers when sorting like this. But more so since `[1, 21, 2].sort(naturalComparator.compare)` will correctly sort, but not if the numbers are negative. – Yanick Rochon Oct 03 '22 at 19:07
  • [^](https://stackoverflow.com/questions/73939121/why-does-intl-collator-sort-negative-numbers-in-descending-order/73939421?noredirect=1#comment130554968_73939421) @YanickRochon I haven't researched the design decisions for the numeric option, but I imagine that a primary use case is for sequences. I added an additional example to illustrate. – jsejcksn Oct 03 '22 at 19:22