DBVisualizer Unicode Bug

Question

I am facing a problem while inserting two characters (É (0xC389) and П (0xD0BF)) in database table that has Charset UTF-8~~.and Collation UTF-8 as well. These both characters come in range U+0800 - U+FFFF. So, I understand that these two characters require 16 bits~~.

Strange thing is, É (0xC389) is being inserted in the table from DBVisualizer and being displayed normal but П (0xD0BF) is not saved properly. I first thought, may be it's client issue but why it happens with 1 character that lies in the same range as the other one?

I am really amazed with the behaviour. So, I don't understand, if my understanding about UTF-8 is wrong or it is really a DBVisualizer bug or am I missing something?

Note: Codepoints between U+0800 and U+FFFF are encoded as 24 bits (3 bytes) in UTF-8. — CharlotteBuff, Apr 06 '17 at 20:50
@RandomGuy32 It does require 24 bits but 8 bits are reserved, so can only use 16 bits, out of those 24 — SSC, Apr 07 '17 at 07:50
Please provide a couple more mis-rendered characters; maybe I can find a pattern. — Rick James, Apr 11 '17 at 05:54

score 1 · Answer 1 · answered Apr 09 '17 at 03:49

1

Your first sentence is wrong.

Are you looking at a mixture of Latin and Cyrillic? Or Hangul?

UTF-8 Hex    Unicode  Visible    Meaning
C389        201=x00C9   É        LATIN CAPITAL LETTER E WITH ACUTE
D09F       1055=x041F   П        CYRILLIC CAPITAL LETTER PE
EC8E89          xC389   쎉       HANGUL SYLLABLE SSENJ
ED82BF          xD0BF   킿       HANGUL SYLLABLE KIH

That is, É is Unicode U+00C9 ("codepoint 201"), and is encoded as hex C389 when used in text. (Etc)

The first two require 16 bits in UTF-8; the other two need 24 bits. This is also the case for MySQL's CHARACTER SET utf8 or utf8mb4.

See if you can get DBVisualizer to talk UTF-8, not Unicode.

answered Apr 09 '17 at 03:49

Rick James

135,179
13
127
222

Updated my question. These were actually hexa values, not Unicode. Also, I have set DBVisualizer settings to UTF-8 but not any change. – SSC Apr 09 '17 at 09:09
`É` is _not_ Unicode C389, it is UTF-8 C389. Those are _different_. How does one specify UTF-8 in DBVisualizer? What technique are you using to discover "C389"? – Rick James Apr 09 '17 at 15:45

DBVisualizer Unicode Bug

1 Answers1