My answer is to use "en"
with localeCompare
Based on my tests below, en
produces the most consistent results.
Here is an example of using that with localeCompare
:
"å".localeCompare("ä", "en")
By my understanding und
(short for undetermined)
(and maybe en-US-u-va-posix
) should be better answers to the original question, but the und
seems to behave differently in
Firefox (and the behavior of en-US-u-va-posix
in NodeJS depends on the presence of the ICU-data).
Perhaps if you are not hitting the Firefox problem, consider
using und
or just the parameterless form to convey the
idea that the locale is not important
Some related things I learned while researching this (adding them here just in case somebody else is heading down the same rabbit hole):
- This very popular answer about using
localeCompare
with lots of useful comments.
- From this old post I learned that there are some "special" locales.
- With older browsers there may be no hope. Quoting the Mozilla documentation: "In older implementations, which ignore the locales and options arguments, the locale and sort order used are entirely implementation dependent."
- NodeJS seems to use the Project ICU icu4c library under the hood for locale related functionality.
- The icu-project has this online tool to experiment with collation orders.
- the plain sort() is different from the other options, but seems stable across the platforms I tested.
- The characters were triggering some kind of bug on older version of NodeJS
- The locale data is not present in my node installation and installing the icu data makes the
localeCompare
have different behavior.
Here is the test code I ended up using:
const testArray=["Ă","Â","Î","Ș","Ț","A","i","I","S","T","é","e","ä","a","","","Д","д", "å", "z"]
const locales=["POSIX", "en-US-u-va-posix", "und", "en", "da", "ru"]
console.log(`${testArray.sort().join("")} sort()`)
console.log(`${testArray.sort((a,b)=>a.localeCompare(b)).join("")} localeCompare(x)`)
locales.forEach(locale => {
const f = (a,b) => a.localeCompare(b, locale)
try{
console.log(`${testArray.sort(f).join("")} ${locale}`)
} catch(e) {
console.log(`${locale}: ${e}`)
}
})
On my Mac with NodeJS 13.5.0 with full-icu I get this output:
AISTaeizÂÎäåéĂȘȚДд sort()
aAĂÂåäeéiIÎSȘTȚzдД localeCompare(x)
aAĂÂåäeéiIÎSȘTȚzдД POSIX
AĂÂIÎSȘTȚaåäeéizдД en-US-u-va-posix
aAĂÂåäeéiIÎSȘTȚzдД und
aAĂÂåäeéiIÎSȘTȚzдД en
AaĂÂeéIiÎSȘTȚzäåДд da
дДaAĂÂåäeéiIÎSȘTȚz ru
Node v12.14.0 gets the same result.
Without the NODE_ICU_DATA, v12.14.0 gives:
AISTaeizÂÎäåéĂȘȚДд sort()
aAĂÂåäeéiIÎSȘTȚzдД localeCompare(x)
aAĂÂåäeéiIÎSȘTȚzдД POSIX
aAĂÂåäeéiIÎSȘTȚzдД en-US-u-va-posix
aAĂÂåäeéiIÎSȘTȚzдД und
aAĂÂåäeéiIÎSȘTȚzдД en
aAĂÂåäeéiIÎSȘTȚzдД da
aAĂÂåäeéiIÎSȘTȚzдД ru
My Chrome browser gives this result:
AISTaeizÂÎäåéĂȘȚДд sort()
aAĂÂåäeéiIÎSȘTȚzдД localeCompare(x)
aAĂÂåäeéiIÎSȘTȚzдД POSIX
aAĂÂåäeéiIÎSȘTȚzдД en-US-u-va-posix
aAĂÂåäeéiIÎSȘTȚzдД und
aAĂÂåäeéiIÎSȘTȚzдД en
AaĂÂeéIiÎSȘTȚzäåДд da
дДaAĂÂåäeéiIÎSȘTȚz ru
The Safari browser on my mac gives the same except for da
:
aAĂÂeéiIÎSȘTȚzäåдД da
Firefox on my mac gives this somewhat different result:
AISTaeizÂÎäåéĂȘȚДд sort()
aAĂÂeéiIÎSȘTȚzåäдД localeCompare(x)
aAĂÂeéiIÎSȘTȚzåäдД POSIX
aAĂÂåäeéiIÎSȘTȚzдД en-US-u-va-posix
aAĂÂeéiIÎSȘTȚzåäдД und
aAĂÂåäeéiIÎSȘTȚzдД en
AaĂÂeéIiÎSȘTȚzäåДд da
дДaAĂÂåäeéiIÎSȘTȚz ru
There is also the function Intl.getCanonicalLocales
. Here are some results I found testing that:
$ node
> Intl.getCanonicalLocales("en-US-POSIX")
[ 'en-US-u-va-posix' ]
- On recent Chrome it works as above in NodeJS
- On recent Safari getCanonicalLocales seems to accept almost any string and returns that string
- Recent Firefox is the same as recent Safari