Does MySQL UTF8 collation fit japanese and korean characters?

Question

I've set all collation and characters sets to UTF8 in PHP and MySQL. There is no problem. But as seen on http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html, standard utf8_general_ci collation uses three bytes for storing characters. That should be enough to store all BMP characters. But I've still found no hint, if all korean and japanese characters are included in BMP or if there are characters that needs four bytes to be stored. I simply want to know, if utf8_general_ci and utf8_bin are really enough to store all korean/japanese characters, or if I have to use utf8mb4_general_ci and utf8mb4_bin?

Joni · Accepted Answer · 2013-09-10T10:24:51.027

2

The most frequently used characters are in the BMP. The characters in higher planes are mostly rare and historic, but some of them may be in use in personal names for example. If you can use utf8mb4 you probably should.

edited Sep 10 '13 at 10:24

answered Sep 10 '13 at 10:17

Joni

108,737
14
143
193

3

Back to the question... Yes, Korean, Katakana, and Hiragana fit into utf8 as well as utf8mb4. However, Kanji, if it is 'all' of Chinese does not. `utf8mb4` is needed for some of Chinese and for Emoji; neither of these is "rare or historic". – Rick James Apr 29 '16 at 04:19

Does MySQL UTF8 collation fit japanese and korean characters?

1 Answers1