Sorting is complicated. The Oracle documentation gives a complete overview of the different aspects.
It'd be nice to know the exact sorting you're trying to reproduce, i.e. the exact value of NLS_SORT
. You can find out by executing
SELECT SYS_CONTEXT ('USERENV', 'NLS_SORT') from SYS.DUAL;
The sort you're using produces
A, a, Á, á, Ä, ä, B, b, C, c
It's not clear what the order of input is.
- It puts
A
before a
. This is odd. I am inferring it's not actually preferring A over a but considers them equal, i.e. is case-insensitive.
- It puts un-accented letters before letters with accents so I'm inferring it's accent-sensitive.
An NLS_SORT
of GENERIC_M_CI
fits the bill. You can check by running it in oracle:
[...] ORDER BY NLSSORT(<colname>, 'NLS_SORT=GENERIC_M_CI');
A Java Collator has a setStrength()
method which accepts values PRIMARY
, SECONDARY
, TERTIARY
and IDENTICAL
.
The exact interpretation depends on the locale but the javadocs give as an example
- The primary strength distinguishes between
a
and b
only.
- The secondary strength also distinguishes between
a
and á
.
- The tertiary strength also distinguishes between
a
and A
.
- The identical strength is only satisfied if the characters are absolutely identical.
So a Collator with strength SECONDARY should serve you fine.
On my machine, with en_US default locale, I tried this out:
List<String> strings = Arrays.asList("A", "Ä", "Á", "B", "C", "a", "á", "ä", "b", "c");
Collator collator = Collator.getInstance();
collator.setStrength(Collator.SECONDARY);
Collections.sort(strings, collator);
System.out.println(strings);
Prints
[A, a, Á, á, Ä, ä, B, b, C, c]
(But if you'd put the a
before the A
, it'd have left that order untouched.)