0

Note that this question is NOT about searching for (non)accented characters.

Suppose I have a table where there is a column name, with collation utf8mb4_unicode_ci.

This collation works perfectly for the purpose of selecting the base selection
in a case-insensitive, accent-insensitive way.

The problem is that I need to order the results in an accent-sensitive and case-insensitive way.

The purpose of this is to select every name starting with some character/string and sort them "alphabetically", first should be not-accented, then accented.

From selection e.g.:

Črpw
Cewo
céag
čefw

The final results should be:

Cewo
céag -- because accented e is more than non-accented
čefw  
Črpw -- because r is more than e

Note that c/C < č/Č , but lower/upper cases are handled as equals.

I tried searching for this problem, but there are only popping similar questions or questions about searching, which is not the case, the searching itself is fine.

From mentioned I've tried this test query:

SELECT * FROM
(SELECT 'Črpw' as t
UNION SELECT 'Cewo'
UNION SELECT 'céag'
UNION SELECT 'čefw')virtual
ORDER BY t COLLATE utf8mb4_czech_ci ASC

Which produces something very similar to what I want

céag
Cewo
čefw
Črpw

But note that é gets ordered before e.

Is there a way how to get to the results order I want to have?

Using: MySQL 5.5.54 (Debian)

Community
  • 1
  • 1
jave.web
  • 13,880
  • 12
  • 91
  • 125
  • The only collations available for utf8 (until version 8.0) are case_sensitive and accent_sensitive. I don't think you could use `UPPER()` or `LOWER()` to help out. – Rick James Apr 07 '17 at 19:55
  • @RickJames yep, I've tried that too... Anyway, this seems to vary across the letters, e.g. `é` gets before `e` but `ř` goes *after* `r` ... – jave.web Apr 07 '17 at 20:06
  • According to my notes, c < č and r < ř, but e = é. These are, as far as I know, "correct" for Czech. It may _appear_ that e > é, but that may be because the ordering is unpredictable. For example `'céag' < 'Cewo'` because `'a' < 'w'`, which is noticed after deciding that `'cé' = 'Ce'`. – Rick James Apr 07 '17 at 23:41
  • utf8[mb4]_czech_ci differs from other collations mostly by having the characters (r/c/s/z) accented with caron coming after the letter plain or other accents. `ch` is also different in that it comes between `h` and `i`; ditto for Slovak. [_More details_](http://mysql.rjweb.org/utf8mb4_collations.html) – Rick James Apr 07 '17 at 23:46

0 Answers0