I've set all collation and characters sets to UTF8 in PHP and MySQL. There is no problem. But as seen on http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html, standard utf8_general_ci
collation uses three bytes for storing characters. That should be enough to store all BMP characters. But I've still found no hint, if all korean and japanese characters are included in BMP or if there are characters that needs four bytes to be stored. I simply want to know, if utf8_general_ci
and utf8_bin
are really enough to store all korean/japanese characters, or if I have to use utf8mb4_general_ci
and utf8mb4_bin
?
Asked
Active
Viewed 2,016 times
2

rabudde
- 7,498
- 6
- 53
- 91
1 Answers
2
The most frequently used characters are in the BMP. The characters in higher planes are mostly rare and historic, but some of them may be in use in personal names for example. If you can use utf8mb4
you probably should.

Joni
- 108,737
- 14
- 143
- 193
-
3Back to the question... Yes, Korean, Katakana, and Hiragana fit into utf8 as well as utf8mb4. However, Kanji, if it is 'all' of Chinese does not. `utf8mb4` is needed for some of Chinese and for Emoji; neither of these is "rare or historic". – Rick James Apr 29 '16 at 04:19