0

In the MySQL database i saved Persian sentences as Unicode(utf8_unicode_ci) in a table. Then i change collation to utf8_persian_ci but the results are same. No changes were made.

What's the difference between "utf8_unicode_ci" and "utf8_persian_ci"?

  • 1
    Unicode is a superset of the Persian encoding. But because your data is all in Persian, "downgrading" from Unicode to Persian encoding means you don't lose any information. – Tim Biegeleisen Oct 18 '21 at 15:20
  • 1
    @TimBiegeleisen - I don't think there is any loss -- we are not talking about 'encoding', rather 'collation'. – Rick James Oct 31 '21 at 01:53

2 Answers2

0

(I cannot speak as an authority specifically on Persian collations.) The general idea behind MySQL collations is

  • _bin -- just check the bits; this is usually useless for "words".
  • _general_ci -- Case and Accent Insensitive, and rather lame when it comes to all other 'rules'.
  • _unicode_ci, _unicode_520_ci, _0900_ai_ci -- Case and Accent Insensitive; based on Uniocde standards 4.0.0, 5.2.0, 9.0.0; but unlikely to be "correct" for any particular language (Spanish, Persian, German, etc)
  • _persian_ci (etc) -- Similar to one of the Unicode collations, but tuned for the language.

An example of how specific collations may differ:

  • _spanish_ci -- one treats "ch" as two letters, as with most collations
  • _spanish2_ci -- treats "ch" as a single letter: 'cz' < 'ch' < 'da'. (And other differences.)
  • _lithuanian_ci -- "ch" is the same as "c"; that is 'cha' = 'ca'.

I assume there may be differences between _persian_ci and _unicode_ci in the Persian characters. A glance of http://mysql.rjweb.org/utf8_collations.html and http://mysql.rjweb.org/utf8mb4_collations.html seems to say that Western European characters are collated the same in those collations.

Rick James
  • 135,179
  • 13
  • 127
  • 222
0
  1. utf8_persian_ci has better performance to store Persian texts & characters.
  2. When you sort Persian text in utf8_persian_ci, the letters "پ","گ","ژ","چ" are in their correct place (order), but in utf8_unicode_ci they become after "ی".
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 06 '22 at 10:51