9

I want to enter German Umlauts in my irb but get a weird error. I can enter any character of äöü without problems, but each of ÄÖÜß leads to the following error:

$ irb
ruby-1.9.2-p136 :001 > ? # here I entered Ü but it displays only ?
/Users/lorenz/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/ruby-lex.rb:728:in
`block in lex_int2': invalid byte sequence in UTF-8 (ArgumentError)

I have looked at a lot of SO questions regarding Ruby, rvm, and UTF-8 but none helped. Most are tied to rails or database configuration. I specifically checked the following:

locale is set correctly

$ locale
LANG="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_CTYPE="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_ALL="de_DE.UTF-8"

Terminal.app is set to Unicode (UTF-8) and Encoding.default_external is set correctly:

$ irb
ruby-1.9.2-p136 :001 > Encoding.default_external
 => #<Encoding:UTF-8>

Why is this still so difficult in Ruby?

Phrogz
  • 296,393
  • 112
  • 651
  • 745
Lorenz
  • 483
  • 5
  • 11
  • Maybe it's a keyboard driver problem? Have you tried pasting the characters instead of typing them? – adamax Feb 13 '11 at 15:21
  • To help triangulate the problem, put the commands you're using in IRB into a source file and let Ruby run them. That will tell you if it's an IRB problem, or if Ruby itself is not happy. – the Tin Man Feb 13 '11 at 21:02
  • 1
    Looks like it's a problem with Terminal.app. I'm getting the same question-mark problem here, with OSX 10.6.6. I can enter an uppercase U with umlaut in xterm without a problem, however. (You can access xterm by launching X11 and choosing "Terminal" from the Applications menu.) Even after this fix, though, IRB can't handle it: if I enter `string = 'Ü'`, I get an "invalid multibyte char (UTF-8)" Ruby error. – Jon Gauthier Feb 20 '11 at 20:12
  • @adamax: the same thing happens when I copy&paste instead of type. – Lorenz Feb 24 '11 at 16:26
  • @the Tin Man: same problem, slightly different error message: "test.rb:1: invalid multibyte char (US-ASCII)". It works fine after I add "# -*- encoding : utf-8 -*-" as the first line. – Lorenz Feb 24 '11 at 16:27
  • @hansengel: I cannot enter any German characters in xterm (tried äöüÄÖÜß) – Lorenz Feb 24 '11 at 16:31

3 Answers3

2

Usually you set encoding with # coding: UTF-8 for a file.

In case of irb it might be necessary to set it in advance and explicitly:

irb -E UTF-8:UTF-8

This will set both internal and external encoding to UTF-8 on irb.

Or additionally try

irb -U

which sets the internal encoding to UTF-8.

Dmytrii Nagirniak
  • 23,696
  • 13
  • 75
  • 130
2

I don't know how to solve the problem but the sure thing is this is an irb only thing, I noticed many times irb has its own unique of dealing with user's inputs (it may even well be a limitation in readline) and it only works well with some characters.

You can do a simple test to check that, create a new rb file with:

# encoding: utf-8
puts "test: Ü"

and execute it, does it works ?

While it is still a nuisance, it is not a big enough problem for me until now to bother really looking for a solution.

Schmurfy
  • 1,715
  • 9
  • 17
0

If you're running on Mac OS, it might be a readline issue. See http://henrik.nyh.se/2008/03/irb-readline .

Marnen Laibow-Koser
  • 5,959
  • 1
  • 28
  • 33