Perl (wrong?) encoding of output file

Question

I am running Active Perl 5.16.3 on Windows 7 (32 bits).

My (short) program massages an input text file (encoded in UTF-8). I wish the output encoding to be in Latin1, so my code is:

open (OUT, '>;encoding(Latin1)', "out.txt") || die "Cannot open output file: $!\n";
print OUT "$string\n";

yet the resulting file is still in UTF-8. What am I doing wrong?

Do you really have a semicolon in the open mode string? It should be a colon - `>:encoding(Latin1)` — Borodin, Feb 21 '14 at 10:19

score 2 · Answer 1 · answered Feb 21 '14 at 10:32

Firstly, the encoding layer is separated from the open mode by a colon, not a semicolon.

open OUT, '>:encoding(latin1)', "out.txt" or die "Cannot open output file: $!\n";

Secondly, Latin-1 can only encode a small subset of UTF-8. Furthermore, most of this subset is encoded the same in both encodings. We therefore have to use a test file with characters that are not encoded the same, e.g. \N{MULTIPLICATION SIGN} U+00D7 ×, which is \xD7 in Latin-1, and \xC3\x97 in UTF-8.

Make also sure that you actually decode the input file.

Here is how you could generate the test file:

$ perl -CSA -E'say "\N{U+00D7}"' > input.txt

Here is how you can test that you are properly recoding the file:

use strict;
use warnings;
use autodie;

open my $in, "<:encoding(UTF-8)", "input.txt";
open my $out, ">:encoding(latin1)", "output.txt";

while (<$in>) {
    print { $out } $_;
}

The input.txt and output.txt should afterwards have different lengths (3 bytes → 2 bytes).

Perl (wrong?) encoding of output file

1 Answers1