0

I am observing strange behaviour with IPC::Open3 arguments as part of a script.

I give a string containing ISO-8859-15. Just before open3() is called (literally the statement before) the string is correct (verified with print and Data::Dumper).

However once the subprocess is started the arguments are now UTF-8 encoded. I have verified this using the desired executable (freebcp) and a wrapper script. I ended up writing a wrapper script which converts all the arguments back to ISO-8859-15.

What causes this behaviour? LANG is set to en_AU.ISO-8859-15. It works correctly on other hosts. I cannot find any reference to binmode()

Dummy00001
  • 16,630
  • 5
  • 41
  • 63
teambob
  • 1,994
  • 14
  • 19
  • "It works correctly on other hosts." - then probably you should also describe to us the hosts where it works, and the host where it doesn't. – Dummy00001 Dec 18 '15 at 08:47
  • The one which does the conversion is RHEL and is using a compiled version of perl 5.20.2 My dev environment is Ubuntu 14.04 using stock perl – teambob Dec 18 '15 at 09:01
  • 1
    Any `use open` with `:encoding` or `:locale`, or `-C` options to `perl`? Perl5 by default doesn't do any charset conversions - default always was and is the dumb binary mode. But in your case it seems that data are not binary. They are Unicode inside the Perl (and thus converted when written out) meaning that somewhere you have told Perl to decode the data from binary. – Dummy00001 Dec 18 '15 at 09:22
  • Cont. If that is the case, then probably you should tell Perl explicitly to convert the data when writing them to `open3` file handles. – Dummy00001 Dec 18 '15 at 09:25
  • I should also point out that a test script just using open3() does no conversion. Very bizarre – teambob Dec 18 '15 at 09:26
  • @Dummy00001: I created a unicode string in perl. When I called print it converted it to the correct character set. However when I call open3() it converts to (leaves as?) UTF-8. Could you turn your comment into an answer and I will accept it – teambob Dec 20 '15 at 22:33

1 Answers1

1

I has a string containing ISO-8859-15. Just before open3() is called (literally the statement before) the string is correct (verified with print and Data::Dumper).

However once the subprocess is started the arguments are now UTF-8 encoded.

LANG is set to en_AU.ISO-8859-15.

Perl5 by default doesn't do any encoding conversion: the strings treated as dumb byte arrays.

That, until you tell Perl that the strings contain the Unicode, for example by calling decode(), or reading string from a file handle that has encoding layer attached (via binmode(), or via open() flags, or via use open with :encoding/:locale, or via command line with -C switch.)

Since you have the string in ISO-8859-15, but it is outputted in UTF-8, that means that the Perl is aware of the encoding of your string. Somewhere somehow you have told Perl the encoding of the string, and it has converted it to the Unicode, which is internally represented using the UTF-8. The UTF-8 which now seems to be printed to the open3() file handles.

As a possible solution, before outputting the strings, you should try to explicitly convert the strings into the desired encoding.

P.S. Using the utf8::is_utf8() function, you can try to debug/find when/how your strings get converted into the Unicode, and whether they are really Unicode.

Community
  • 1
  • 1
Dummy00001
  • 16,630
  • 5
  • 41
  • 63