5

I have a PHP web app with MySQL tables taking utf8 text. I recently converted the data from latin1 to utf8 along with the tables and columns accordingly. I did, however, forget to use mysql_set_charset and the latest incoming data I would assume came through the MySQL connection as latin1. I don't know what happens when latin1 comes in to a utf8 column, but it's causing some strange display issues for items like comma, quotes, ampersand, etc.

Now that mysql_set_charset is in place, it is pulling the data out with funky characters. Any way to convert the latin1-utf8 soup to straight utf8 now that i have the database connection resource using the correct charset?

scootklein
  • 708
  • 1
  • 8
  • 21
  • Too lazy to test around and sell it as my answer.. http://www.mysqlperformanceblog.com/2007/12/18/fixing-column-encoding-mess-in-mysql/ should help you out (some nice comments there too). If its possible to just get the rows inserted/updated during the period the set_charset() was missing you can even convert these rows practically without checking anything. – Kuchen Jul 04 '10 at 22:53

1 Answers1

4

Found the fix with your comment. Here was the SQL line that seemingly has solved my issue.

UPDATE table SET col = CONVERT(CONVERT(CONVERT(col USING latin1) USING binary) using utf8);

Even though the column is UTF8, it forces it to pull the data out as latin1, convert to binary, convert to utf8 and re-insert.

scootklein
  • 708
  • 1
  • 8
  • 21