0

I am facing a problem while inserting two characters (É (0xC389) and П (0xD0BF)) in database table that has Charset UTF-8.and Collation UTF-8 as well. These both characters come in range U+0800 - U+FFFF. So, I understand that these two characters require 16 bits.

Strange thing is, É (0xC389) is being inserted in the table from DBVisualizer and being displayed normal but П (0xD0BF) is not saved properly. I first thought, may be it's client issue but why it happens with 1 character that lies in the same range as the other one?

I am really amazed with the behaviour. So, I don't understand, if my understanding about UTF-8 is wrong or it is really a DBVisualizer bug or am I missing something?

Rick James
  • 135,179
  • 13
  • 127
  • 222
SSC
  • 2,956
  • 3
  • 27
  • 43

1 Answers1

1

Your first sentence is wrong.

Are you looking at a mixture of Latin and Cyrillic? Or Hangul?

UTF-8 Hex    Unicode  Visible    Meaning
C389        201=x00C9   É        LATIN CAPITAL LETTER E WITH ACUTE
D09F       1055=x041F   П        CYRILLIC CAPITAL LETTER PE
EC8E89          xC389   쎉       HANGUL SYLLABLE SSENJ
ED82BF          xD0BF   킿       HANGUL SYLLABLE KIH

That is, É is Unicode U+00C9 ("codepoint 201"), and is encoded as hex C389 when used in text. (Etc)

The first two require 16 bits in UTF-8; the other two need 24 bits. This is also the case for MySQL's CHARACTER SET utf8 or utf8mb4.

See if you can get DBVisualizer to talk UTF-8, not Unicode.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Updated my question. These were actually hexa values, not Unicode. Also, I have set DBVisualizer settings to UTF-8 but not any change. – SSC Apr 09 '17 at 09:09
  • `É` is _not_ Unicode C389, it is UTF-8 C389. Those are _different_. How does one specify UTF-8 in DBVisualizer? What technique are you using to discover "C389"? – Rick James Apr 09 '17 at 15:45