What's the current best way to transliterate characters to 7-bit ASCII in Ruby? Most of questions I've seen on SO are 3 or 4 years old and the solutions don't fully work.
I want a method that will work for a wide range of Latin alphabets and, for example, convert
Your résumé’s a non–encyclopædia
to
Your resume's a non-encyclopaedia
but I cannot find a way that does that, particularly for folding 8-bit ASCII to 7-bit ASCII.
s = "Your r\u00e9sum\u00e9\u2019s a non\u2013encyclop\u00e6dia"
puts Iconv.iconv('ascii//ignore//translit', 'utf-8', s)
# => Your r'esum'e's a non-encyclopaedia
puts s.encode('ascii//ignore//translit', 'utf-8')
# => Encoding::ConverterNotFoundError: code converter not found (UTF-8 to ascii//ignore//translit)
puts s.encode('ascii', 'utf-8')
# Encoding::UndefinedConversionError: U+00E9 from UTF-8 to US-ASCII
puts s.encode('ascii', 'utf-8', invalid: :replace, undef: :replace)
# Your r?sum??s a non?encyclop?dia
puts I18n.transliterate(s)
# Your resume?s a non?encyclopaedia
Since Iconv
is deprecated I'd rather not use that if I don't have to, but I'd do it if that is the only thing that works. Obviously I could put in custom 8-bit ASCII to 7-bit ASCII translations, but I'd prefer to use a supported solution that has been thoroughly tested.
The translation is handled fine by International Components for Unicode with its Latin-ASCII translation, but that is only available for Java and C.
UPDATE
What I ended up doing was writing my own character translation routines to take care of punctuation and whitespace, after which I could use I18n.transliterate
to do the rest. I'd still prefer finding and using a well-maintained library function to handle the stuff I18n
does not.