0

I am converting an old application that uses a mySQL database encoded with latin1. To do this I made a simple rake task and some classes to connect to the old legacy database (MySQL)

class LegacyComment < ActiveRecord::Base
    establish_connection :legacy
end

The problem is no matter what I try to convert the old latin1 to utf8 I get odd characters:

"What he didn’t expect"

I've tried creating a duplicate table in the legacy database and then running

ALTER TABLE legacy_comments CONVERT TO CHARACTER SET utf8; 

I've tried using Ruby's string.encode method from other answers I found here. Tried http://jalada.co.uk/2011/12/07/solving-latin1-and-utf8-errors-for-good-in-ruby.html to no avail.

I've tried varies settings in database.yml all with no luck. Not sure where to go next.

Kansha
  • 570
  • 4
  • 12
  • What you are getting is a UTF-8 string incorrectly decoded as Windows-1252. The data you are retrieving *is* UTF-8, but it looks like you are viewing it in something like Notepad, which uses Windows-1252 as the default codec in US Windows. How are you displaying the string? – Mark Tolonen Jan 30 '13 at 02:22
  • No sorry that's not it. Displaying it as utf8 output to browser (ya with utf8 encoded html5), to the terminal, etc. After a lot of work on this today I determined it was easier to manually edit the few hundred records that mysteriously had this problem. Thankfully out of 50,000ish records only a few hundred had these "corruptions." Feels silly I had to do this so leaving the question open in case someone knows something and then maybe someone else won't have to do what I did :-\ – Kansha Jan 30 '13 at 09:07

0 Answers0