3

I have some Perl code which is executed in a context where all command line arguments, inputs and outputs are encoded in the encoding given by the LC_CTYPE environment variable (or more generally the LC_CTYPE setting determined from the environment). This is exactly what use locale is for, right?

$ echo àé | perl -e 'use locale; print uc <>'
ÀÉ

This works in unibyte locales such as Latin-1, but not in UTF-8, where this program outputs àé on my Debian wheezy machine.

perl -CLADS -e 'use locale; print uc <>' seems to do the right thing in unibyte locales and UTF-8, at least according to my understanding of the documentation of -C. I don't understand how I'm supposed to deduce that from the perllocale documentation though, nor what would happen in multibyte locales other than UTF-8.

Furthermore I actually don't want to run the whole program in this mode, only one code block. In fact I can't pass parameters to the Perl interpreter, I can only pass a string to a Perl script which calls eval on that string. use locale's local scope would be just fine, but how do I activate -C from within?

The read-only magic variable ${^UNICODE}

… so not that then.

How do I run a snippet of Perl code in a mode where all strings (including @ARGV and file input/output) are interpreted according to the locale indicated by the environment?

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
  • One citation: [perldelta v5.20(!)](http://perldoc.perl.org/5.20.0/perldelta.html#use-locale-now-works-on-UTF-8-locales). _Until this release, only single-byte locales, such as the ISO 8859 series were supported. Now, the increasingly common multi-byte UTF-8 locales are also supported. A UTF-8 locale is one in which the character set is Unicode and the encoding is UTF-8._ – clt60 May 28 '15 at 20:26

1 Answers1

4

It seems like perlrun explains that -C is a combination of binmode and use open;, so this will probably work (on *nix)

update: decoding @ARGV with a little help from open.pm :)

{
    use Encode();
    require encoding;
    local @ARGV = @ARGV ;
    if( my $locale_encoding = encoding::_get_locale_encoding() ){
        $locale_encoding = ":encoding($locale_encoding)";
        @ARGV = map { Encode::decode($locale_encoding, $_ ) } @ARGV;
    }
    use open ':locale';
    use locale;
    ...
}
optional
  • 2,061
  • 12
  • 16
  • Thanks, that's a good start, it works for `echo àé | perl -e 'use locale; print uc <>'`, but it's evidently not enough: it breaks `perl -le 'use locale; use open qw(:locale); print uc($ARGV[0])' é` which prints `é`: somehow `use open qw(:locale)` seems to cause the UTF-8 argument to be interpreted as latin1! – Gilles 'SO- stop being evil' May 28 '15 at 06:49
  • yes, locale/open dont touch @ARGV, that you have to do yourself, see update (which will be updated again) – optional May 28 '15 at 08:25