4

I am running Active Perl 5.14 on Windows 7. I am trying to write a program that will read-in a conversion table, then work on a file and replace certain patterns by other patterns - all of the above in Unicode (UTF-8). Here is the beginning of the program:

#!/usr/local/bin/perl
# Load a conversion table from CONVTABLE to %ConvTable.
# Then find matches in a file and convert them.
use strict;
use warnings;
use Encode;
use 5.014;
use utf8;
use autodie; 
use warnings    qw< FATAL  utf8     >;
use open        qw< :std  :utf8     >;
use charnames   qw< :full >;
use feature     qw< unicode_strings >;

my ($i,$j,$InputFile, $OutputFile,$word,$from,$to,$linetoprint);
my (@line, @lineout); 
my %ConvTable;    # Conversion hash
print 'Conversion table: opening file: E:\My Documents\Perl\Conversion table.txt'."\n";
my $sta= open (CONVTABLE, "<:encoding(utf8)", 'E:\My Documents\Perl\Conversion table.txt');
binmode STDOUT, ':utf8';    # output should be in UTF-8
# Load conversion hash
while (<CONVTABLE>) {
    chomp;
    print "$_\n"; # etc ...
# etc ...

It turns out that at this point, it says:

wide character in print at (eval 155)E:/Active Perl/lib/Perl5DB.pl:640]line 2, <CONVTABLE> line 1, etc...

Why is that? I think I've gone through and implemented all the necessary prescriptions for correct handling of Unicode strings, decoding and encoding into UTF-8? And how to fix it?

TIA

Helen

ikegami
  • 367,544
  • 15
  • 269
  • 518
Helen Craigman
  • 1,443
  • 3
  • 16
  • 25
  • Your code is fine. What happens if you run it outside of the debugger? – ikegami Feb 15 '12 at 20:01
  • 2
    By the way, the `binmode STDOUT, ':utf8';` is redundant with the same being done by `use open qw< :std :utf8 >;`. – ikegami Feb 15 '12 at 20:02
  • 2
    By the way, the `use feature qw< unicode_strings >;` (which has no effect on that code) is redundant with the same being done by `use 5.014;`. – ikegami Feb 15 '12 at 20:03
  • [crossposted on PerlMonks](http://www.perlmonks.org/?node_id=954045) – ikegami Feb 15 '12 at 20:07

2 Answers2

5

The Perl debugger has its own output handle that is distinct from STDOUT (although it may ultimately go to the same place as STDOUT). You'll also want to do something like this near the beginning of your script:

binmode $DB::OUT, ':utf8' if $DB::OUT;
socket puppet
  • 3,191
  • 20
  • 16
  • Thank you. This solved the "Wide characters" error at the debugger. But now I have further problems with the program - it doesn't seem to handle the Unicode characters correctly - I wonder If I should continue on this question, or open a new one – Helen Craigman Feb 15 '12 at 20:38
  • Open a new one. This isn't a forum discussion, it's a Q&A site. Combining multiple questions into a single one will just confuse people who later come looking for these same answers. – mpeters Feb 16 '12 at 14:55
  • OK, I will mark this answer as the solution, and open a new question. – Helen Craigman Feb 17 '12 at 00:22
0

I suspect that the problem is in some part of the code that you haven't shown us. I base this suspicion on the following facts:

  1. The error message you quote says at (eval 155). There are no evals in your code.

  2. The code you have shown us above does not produce a "wide character" warning when I run it, even if the input contains Unicode characters. The only way I can make it produce one is to comment out both the use open line and the binmode STDOUT line.

Admittedly, my testing environment is not exactly identical to yours: I'm on Linux, and my Perl is only v5.10.1, meaning that I had to lower the version requirement and turn off the unicode_strings feature (not that you're actually using it). Still, I very much suspect that the problem is not in the code you've posted.

Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153
  • Apparently the eval is with the debugger (I've been running it under Padre). When I run the program without the debugger, It says: `Name "main::INPUT" used only once: possible typo at Conv.pl line 58. Name "main::CONVTABLE" used only once: possible typo at Conv.pl line 26. Name "main::OUTPUT" used only once: possible typo at Conv.pl line 71. Conversion table: opening file: E:\My Documents\Perl\Conversion table.txt ∩╗┐England, Germany he, she the, HOMHOM <╫ö╫ó╫¿╫│11> <╫ö╫ó╫£╫Ö╫ò╫ƒ> <╫ù ╫Ö╫│ ╫¥> <╫ù╫Ö╫Ö╫¥>` – Helen Craigman Feb 15 '12 at 20:00