Mysql - What's the difference between "utf8_unicode_ci" and "utf8_persian_ci"?

Question

In the MySQL database i saved Persian sentences as Unicode(utf8_unicode_ci) in a table. Then i change collation to utf8_persian_ci but the results are same. No changes were made.

What's the difference between "utf8_unicode_ci" and "utf8_persian_ci"?

Unicode is a superset of the Persian encoding. But because your data is all in Persian, "downgrading" from Unicode to Persian encoding means you don't lose any information. — Tim Biegeleisen, Oct 18 '21 at 15:20
@TimBiegeleisen - I don't think there is any loss -- we are not talking about 'encoding', rather 'collation'. — Rick James, Oct 31 '21 at 01:53

Rick James · Answer 1 · 2021-10-31T01:51:49.940

(I cannot speak as an authority specifically on Persian collations.) The general idea behind MySQL collations is

_bin -- just check the bits; this is usually useless for "words".
_general_ci -- Case and Accent Insensitive, and rather lame when it comes to all other 'rules'.
_unicode_ci, _unicode_520_ci, _0900_ai_ci -- Case and Accent Insensitive; based on Uniocde standards 4.0.0, 5.2.0, 9.0.0; but unlikely to be "correct" for any particular language (Spanish, Persian, German, etc)
_persian_ci (etc) -- Similar to one of the Unicode collations, but tuned for the language.

An example of how specific collations may differ:

_spanish_ci -- one treats "ch" as two letters, as with most collations
_spanish2_ci -- treats "ch" as a single letter: 'cz' < 'ch' < 'da'. (And other differences.)
_lithuanian_ci -- "ch" is the same as "c"; that is 'cha' = 'ca'.

I assume there may be differences between _persian_ci and _unicode_ci in the Persian characters. A glance of http://mysql.rjweb.org/utf8_collations.html and http://mysql.rjweb.org/utf8mb4_collations.html seems to say that Western European characters are collated the same in those collations.

score 0 · Answer 2 · answered Mar 06 '22 at 08:31

0

utf8_persian_ci has better performance to store Persian texts & characters.
When you sort Persian text in utf8_persian_ci, the letters "پ","گ","ژ","چ" are in their correct place (order), but in utf8_unicode_ci they become after "ی".

answered Mar 06 '22 at 08:31

Hamidreza Rezaei

1
2

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 06 '22 at 10:51

Mysql - What's the difference between "utf8_unicode_ci" and "utf8_persian_ci"?

2 Answers2