Today, I stumbled onto a weird issue with the JavaScript / ECMAScript Internationalization API that I can't find a suitable explanation anywhere. I am getting different results when comparing two specific characters - the forward-slash (/
) and the underscore (_
) characters using:
- plain-vanilla / traditional UTF-16 based comparison
- the
Intl.Collator.prototype.compare()
method
The Plain / Traditional UTF-16 based comparison
// Vanilla JavaScript comparator
const cmp = (a, b) => a < b ? -1 : a > b ? 1 : 0;
console.log(cmp('/', '_'));
// Output: -1
// When sorting
const result = ['/', '_'].sort(cmp);
console.log(result);
// Output: ['/', '_']
The Intl.Collator.prototype.compare()
method
const collator = new Intl.Collator('en', {
sensitivity: 'base',
numeric: true
});
console.log(collator.compare('/', '_'));
// Output: 1
// When sorting
const result = ['/', '_'].sort(collator.compare);
console.log(result);
// Output: ['_', '/']
Questions
Why do both techniques yield different results? Is this a bug in the ECMAScript implementation? What am I missing / failing to understand here? Are there other such character combinations which would yield different results for the English (en
) language / locale?
Edit 2021-10-01
As @t-j-crowder pointed out, replaced all "ASCII" to "UTF-16".