19

I have a MySQL table properly set to the UTF-8 character set. I suspect some data inserted into one of my columns has been double encoded. I am expecting to see a non-breaking space character (UTF-8 0xC2A0), but what I get when selecting this column out of this table is four octets (0xC3A2 0xC2A0). That's what I would expect to see if at some point somebody had treated an UTF-8 0xC2A0 as ISO-8859-1 then attempted to encode again to UTF-8 before inserting into MySQL.

My test above where I am seeing the four octets involves selecting this column out of MySQL with Perl's DBD::mysql. I'd like to take Perl and DBD::mysql out of the equation to verify that those four octets are actually what MySQL has stored. Is there a way to do this directly with a SQL query?

Ryan Olson
  • 2,766
  • 4
  • 29
  • 36
  • HEX did it, and looks like I was right about the double encoding in the table. I'll need to look at the upstream system to deal with the encoding problem. – Ryan Olson Dec 10 '09 at 16:06
  • Ryan! How the heck do you know so much about UTF8 that you are able to make this guess correctly!? – benathon Jan 16 '13 at 07:14

3 Answers3

31
mysql> SELECT HEX(name) FROM mytable;
+-----------+
| hex(name) |
+-----------+
| 4142C2A0  | 
+-----------+
bobince
  • 528,062
  • 107
  • 651
  • 834
4

Why not try the BINARY operator?

"The BINARY operator casts the string following it to a binary string. This is an easy way to force a column comparison to be done byte by byte rather than character by character."

http://dev.mysql.com/doc/refman/5.0/en/cast-functions.html

Hope this helps!

simeonwillbanks
  • 1,459
  • 16
  • 18
4

You could try using the HEX() function [http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_hex]. From the documentation, SELECT HEX('abc'); returns 616263.

Matt McClellan
  • 1,639
  • 1
  • 10
  • 12