Ruby 1.8.7 was not multibyte character savvy like 1.9+ is. In general, it treats a string as a series of bytes, rather than characters. If you need better handling of such characters, consider upgrading to 1.9+.
James Gray has a series of articles about dealing with multibyte characters in Ruby 1.8. I highly recommend taking the time to read through them. It's a complex subject so you'll want to read the entire series he wrote a couple times.
Also, 1.8 encoding support needs the $KCODE
flag set:
$KCODE = "U"
so you'll need to add that to code running in 1.8.
Here is a bit of sample code:
#encoding: UTF-8
require 'rubygems'
require 'iconv'
chars = "éáéíóúÀÉÍÓÚ"
puts Iconv.iconv("ASCII//translit", "utf-8", chars)
puts chars.split('')
puts chars.split('').join
Using ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-darwin10.7.0] and running it in IRB, I get:
1.8.7 :001 > #encoding: UTF-8
1.8.7 :002 >
1.8.7 :003 > require 'iconv'
true
1.8.7 :004 >
1.8.7 :005 > chars = "\303\251\303\241\303\251\303\255\303\263\303\272\303\200\303\211\303\215\303\223\303\232"
"\303\251\303\241\303\251\303\255\303\263\303\272\303\200\303\211\303\215\303\223\303\232"
1.8.7 :006 >
1.8.7 :007 > puts Iconv.iconv("ASCII//translit", "utf-8", chars)
'e'a'e'i'o'u`A'E'I'O'U
nil
1.8.7 :008 >
1.8.7 :009 > puts chars.split('')
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
nil
1.8.7 :010 > puts chars.split('').join
éáéíóúÀÉÍÓÚ
At line 9 in the output I told Ruby to split the line into its concept of characters, which in 1.8.7, was bytes. The resulting '?' mean it didn't know what to do with the output. A line 10 I told it to split, which resulted in an array of bytes, which join
then reassembled into the normal string, allowing the multibyte characters to be translated normally.
Running the same code using Ruby 1.9.2 shows better, and more expected and desirable, behavior:
1.9.2p290 :001 > #encoding: UTF-8
1.9.2p290 :002 >
1.9.2p290 :003 > require 'iconv'
true
1.9.2p290 :004 >
1.9.2p290 :005 > chars = "éáéíóúÀÉÍÓÚ"
"éáéíóúÀÉÍÓÚ"
1.9.2p290 :006 >
1.9.2p290 :007 > puts Iconv.iconv("ASCII//translit", "utf-8", chars)
'e'a'e'i'o'u`A'E'I'O'U
nil
1.9.2p290 :008 >
1.9.2p290 :009 > puts chars.split('')
é
á
é
í
ó
ú
À
É
Í
Ó
Ú
nil
1.9.2p290 :010 > puts chars.split('').join
éáéíóúÀÉÍÓÚ
Ruby maintained the multibyte-ness of the characters, through the split('')
.
Notice that in both cases, Iconv.iconv
did the right thing, it created characters that were visually similar to the input characters. While the leading apostrophe looks out of place, it's there as a reminder the characters were accented originally.
For more information, see the links on the right to related questions or try this SO search for [ruby] [iconv]