0

I would like to have a stable, well defined sort order in Javascript.

It could be any defined locale, but the point is to have the code say "just some neutral sort order used here".

Java and C# have their Neutral culture and ROOT locales which have been used for this purpose.

The Mozilla documentation does not at least immediately seem to offer such options.

Peter Lamberg
  • 8,151
  • 3
  • 55
  • 69

2 Answers2

1

My answer is to use "en" with localeCompare

Based on my tests below, en produces the most consistent results.

Here is an example of using that with localeCompare:

"å".localeCompare("ä", "en")

By my understanding und (short for undetermined) (and maybe en-US-u-va-posix) should be better answers to the original question, but the und seems to behave differently in Firefox (and the behavior of en-US-u-va-posix in NodeJS depends on the presence of the ICU-data).

Perhaps if you are not hitting the Firefox problem, consider using und or just the parameterless form to convey the idea that the locale is not important

Some related things I learned while researching this (adding them here just in case somebody else is heading down the same rabbit hole):

  • This very popular answer about using localeCompare with lots of useful comments.
  • From this old post I learned that there are some "special" locales.
  • With older browsers there may be no hope. Quoting the Mozilla documentation: "In older implementations, which ignore the locales and options arguments, the locale and sort order used are entirely implementation dependent."
  • NodeJS seems to use the Project ICU icu4c library under the hood for locale related functionality.
  • The icu-project has this online tool to experiment with collation orders.
  • the plain sort() is different from the other options, but seems stable across the platforms I tested.
  • The characters were triggering some kind of bug on older version of NodeJS
  • The locale data is not present in my node installation and installing the icu data makes the localeCompare have different behavior.

Here is the test code I ended up using:

const testArray=["Ă","Â","Î","Ș","Ț","A","i","I","S","T","é","e","ä","a","","","Д","д", "å", "z"]
const locales=["POSIX", "en-US-u-va-posix", "und", "en", "da", "ru"]

console.log(`${testArray.sort().join("")} sort()`)

console.log(`${testArray.sort((a,b)=>a.localeCompare(b)).join("")} localeCompare(x)`)

locales.forEach(locale => {
    const f = (a,b) => a.localeCompare(b, locale)
    try{
        console.log(`${testArray.sort(f).join("")} ${locale}`)
    } catch(e) {
        console.log(`${locale}: ${e}`)
    }
})

On my Mac with NodeJS 13.5.0 with full-icu I get this output:

AISTaeizÂÎäåéĂȘȚДд sort()
aAĂÂåäeéiIÎSȘTȚzдД localeCompare(x)
aAĂÂåäeéiIÎSȘTȚzдД POSIX
AĂÂIÎSȘTȚaåäeéizдД en-US-u-va-posix
aAĂÂåäeéiIÎSȘTȚzдД und
aAĂÂåäeéiIÎSȘTȚzдД en
AaĂÂeéIiÎSȘTȚzäåДд da
дДaAĂÂåäeéiIÎSȘTȚz ru

Node v12.14.0 gets the same result.

Without the NODE_ICU_DATA, v12.14.0 gives:

AISTaeizÂÎäåéĂȘȚДд sort()
aAĂÂåäeéiIÎSȘTȚzдД localeCompare(x)
aAĂÂåäeéiIÎSȘTȚzдД POSIX
aAĂÂåäeéiIÎSȘTȚzдД en-US-u-va-posix
aAĂÂåäeéiIÎSȘTȚzдД und
aAĂÂåäeéiIÎSȘTȚzдД en
aAĂÂåäeéiIÎSȘTȚzдД da
aAĂÂåäeéiIÎSȘTȚzдД ru

My Chrome browser gives this result:

AISTaeizÂÎäåéĂȘȚДд sort()
aAĂÂåäeéiIÎSȘTȚzдД localeCompare(x)
aAĂÂåäeéiIÎSȘTȚzдД POSIX
aAĂÂåäeéiIÎSȘTȚzдД en-US-u-va-posix
aAĂÂåäeéiIÎSȘTȚzдД und
aAĂÂåäeéiIÎSȘTȚzдД en
AaĂÂeéIiÎSȘTȚzäåДд da
дДaAĂÂåäeéiIÎSȘTȚz ru

The Safari browser on my mac gives the same except for da:

aAĂÂeéiIÎSȘTȚzäåдД da

Firefox on my mac gives this somewhat different result:

AISTaeizÂÎäåéĂȘȚДд sort()
aAĂÂeéiIÎSȘTȚzåäдД localeCompare(x)
aAĂÂeéiIÎSȘTȚzåäдД POSIX
aAĂÂåäeéiIÎSȘTȚzдД en-US-u-va-posix
aAĂÂeéiIÎSȘTȚzåäдД und
aAĂÂåäeéiIÎSȘTȚzдД en
AaĂÂeéIiÎSȘTȚzäåДд da
дДaAĂÂåäeéiIÎSȘTȚz ru

There is also the function Intl.getCanonicalLocales. Here are some results I found testing that:

$ node
> Intl.getCanonicalLocales("en-US-POSIX")
[ 'en-US-u-va-posix' ]
  • On recent Chrome it works as above in NodeJS
  • On recent Safari getCanonicalLocales seems to accept almost any string and returns that string
  • Recent Firefox is the same as recent Safari
Peter Lamberg
  • 8,151
  • 3
  • 55
  • 69
1

As @Bergi pointed out above, in many cases you can just use < on strings and it is well defined and culture insensitive.

Say for using some kind of custom sort something like this would work:

["foo", "bar"].sort((a, b)=> a < b ? -1 : 1)

(in that simple case .sort() would of course be equivalent and simpler.)

Peter Lamberg
  • 8,151
  • 3
  • 55
  • 69