0

I'm upgrading a Rails 3.2.13 application from Ruby 1.8.7-p370 to Ruby 1.9.3-p385. After upgrading, special characters are garbled in text retrieved from the database. For instance "café" appears as "café". My database is latin1 encoded. I'm using mysql2 (0.3.11) and my database.yml looks like this:

development:
  adapter: mysql2
  encoding: latin1
  database: my_db
  username: root
  host: localhost

(The same problem is also happening in the production environment, which has the same database config.)

It appears that when ActiveRecord retrieves text from the database, it decodes it as if it were utf-8, not latin1 (or ISO-8859-1) as I've specified.

To diagnose the problem, I wrote a Ruby script that that uses mysql2 to query the database directly, bypassing ActiveRecord:

require 'rubygems'
require 'mysql2'

client = Mysql2::Client.new(:host => "localhost",
                            :username => "root",
                            :database => "food52_development_production",
                            :encoding => "latin1")

result = client.query('SELECT title FROM recipes WHERE id = 12934')

puts result.first["title"]

The recipe with id 12934 has the word "café" in its title. Running this script in 1.9.3 outputs the correctly decoded text ("café"). If I change the :encoding option to "utf-8", I once again see the garbled text ("café").

I also tried placing a breakpoint in ActiveRecord::ConnectionAdapters, to see how what encoding configuration Rails was initializing the Mysql2::Client with. It is being passed :encoding => "latin1", as expected.

And yet: somewhere along the line, Rails decides to decode the text as utf-8. How do I get Rails to respect the latin1 encoding configuration I specified? Thanks in advance for your help.

hoffm
  • 2,386
  • 23
  • 36

1 Answers1

1

As of 1.9.3, iconv is deprecated. Also, Rails 3 expects UTF-8 encoding on all input.

With that said, you've got a couple different options. The first, is pretty hacky, but if you don't want to migrate your data, it will work.

The iconv library is still available as a gem, which you should be able to use to manually to do those conversions whenever necessary in your app.

The guys over at Airbnb use a helper like this:

def self.convert_string_encoding(to, from, str)
  if "1.9".respond_to?(:force_encoding)
    str = str.dup if str.frozen?
    str.encode(to, from, :undef => :replace)
  else
    require 'iconv'
    Iconv.conv(to, from, str)
  end
end

to handle the conversions. You could potentially throw this in a helper for your views.

You can read more about their migration here

The problem will be when trying to convert rails' default UTF-8 back to your databases' encoding.

What would probably make more sense is to do a migration to UTF-8 on your existing data.

This article seems to cover that fairly well.

I hope this helps!

tylerdavis
  • 185
  • 4
  • 13
  • So, `some_string.encode('ISO-8859-1', 'utf-8')` does the conversion just fine: https://gist.github.com/hoffm/5207153 However, I'm not sure how to use this information to fix my app. – hoffm Mar 20 '13 at 18:22
  • There're patches online that you can try for your database, or you can do a one time script to re-encode all of the fields in your database to UTF-8. One of the issues is, Rails expects UTF-8 as default, so the data re-encode script might be your best bet moving forward. [Here's an article on how to convert your database.](http://climbtothestars.org/archives/2004/07/18/converting-mysql-database-contents-to-utf-8/) Another [another article](http://yehudakatz.com/2010/05/17/encodings-unabridged/) on the differences in 1.9 and Rails 3. – tylerdavis Mar 20 '13 at 19:10
  • Thanks, Tyler. I plan to do the conversion eventually, but I was hoping not to have to do it simultaneously with the 1.9 migration. – hoffm Mar 20 '13 at 19:44