I have been given a file, (probably) encoded in Latin-1 (ISO 8859-1), and there are some conversions and data mining to be done with it. The output is supposed to be in UTF-8, and I have tried about anything I could find about encoding conversion in Perl, none of them produced any usable output.
I know that use utf8;
does nothing to begin with. I have tried the Encode
package, which looked promising:
open FILE, '<', $ARGV[0] or die $!;
my %tmp = ();
my $last_num = 0;
while (<FILE>) {
$_ = decode('ISO-8859-1', encode('UTF-8', $_));
chomp;
next unless length;
process($_);
}
I tried that in any combination I could think of, also thrown in a binmode(STDOUT, ":utf8");
, open FILE, '<:encoding(ISO-8859-1)', $ARGV[0] or die $!;
and much more. The result were either scrambled umlauts, or an error message like \xC3 is not a valid UTF-8 character
, or even mixed text (Some in UTF-8, some in Latin-1).
All I wanna have is a simple way to read in a Latin-1 text file and produce UTF-8 output on the console via print
. Is there any simple way to do that in Perl?