2

I have strings with a bunch of special characters. This works:

myString.upcase.tr('æ-ý','Æ-Ý')

However, it does not work really cross-platform. My Ruby implementation on Windows won't go with this (on my Mac and Linux machines, works like a charm). Any pointers / workarounds / solutions, really appreciated!

MiningSam
  • 583
  • 2
  • 7
  • 22

2 Answers2

4

Try mb_chars method if you are using Rails >= 3. For example,

 'æ-ý'.mb_chars.upcase

 => "Æ-Ý"

If you're not using Rails please try unicode gem.

 Unicode::upcase('æ-ý')

Or you can override String class methods as well:

require "unicode";
class String
   def downcase
     Unicode::downcase(self)
   end
   def downcase!
     self.replace downcase
   end
   def upcase
     Unicode::upcase(self)
   end
   def upcase!
     self.replace upcase
   end
   def capitalize
     Unicode::capitalize(self)
   end
   def capitalize!
     self.replace capitalize
   end
end 
Yuri Karpovich
  • 382
  • 4
  • 10
2

Unfortunately, it is impossible to correctly upcase/downcase a string without knowing the language and it in some cases even the contents of the string.

For example, in English the uppercase variant of i is I and the lowercase variant of I is i, but in Turkish the uppercase variant of i is İ and the lowercase variant of I is ı. In German, the uppercase variant of ß is SS, but so is the uppercase variant of ss, so to downcase, you need to understand the text, because e.g. MASSE could be downcased to either masse (mass) or maße (measurements).

Ruby takes the easy way out and simply only uppercases/downcases within the ASCII alphabet.

However, that only explains why your workaround is needed, not why it sometimes works and sometimes doesn't. Provided that you use the same Ruby version and the same Ruby implementation and the same version of the implementation on all platforms, it should work. YARV doesn't use the underlying platform's string manipulation routines much (the same is true for most Ruby implementations, actually, even JRuby doesn't use Java's powerful string libraries but rolls its own for maximum compatibility), and it also doesn't use any third-party libraries (like e.g. ICU) except Onigmo, so it's unlikely that platform differences are to blame. Different versions of Ruby use different versions of the Unicode Character Database, though (e.g. I believe it was updated somewhere between 1.9 and 2.2 at least once), so if you have a version mismatch, that might explain it.

Or, it might be a genuine bug in YARV on Windows. Maybe try JRuby? It tends to be more consistent between platforms, in fact, on Windows, it is more compatible with Ruby than Ruby (i.e. YARV) itself!

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653