4

I am struggling working with a path containing non English characters (Activestate Perl, Windows XP). How do I open, write, copy, etc. a file located in a path with let say Greek/Russian/French accented characters? Let's say the directory I want to copy my text.txt file to is: C:\Documents and Settings\στα\Desktop

use File::Spec;
my $save = File::Spec->canonpath( $mw->chooseDirectory() );

my $file = catfile($save , "renamed_text.txt");

my $input = "üüü\text.txt";
copy ($input, $file) or die "File cannot be copied.";
innaM
  • 47,505
  • 4
  • 67
  • 87
Richard
  • 41
  • 1
  • 2
  • 1
    Your code snippet has a problem: You want "\\text.txt" not "\text.txt". This probably isn't your final problem though. – Leolo Nov 21 '10 at 06:08
  • I needed to add at start of script: ${^WIDE_SYSTEM_CALLS}=1; – Harry Jun 08 '21 at 07:33

5 Answers5

3

I don't have privileges to vote up the answer from Chris Dolan but I have resolved this problem for path names here in Japan with the same solution based on Win32::Codepage.

This probably needs confirmation but I think Perl assumes UTF8 for all non-ASCII path names. On Linux and OS X, this works fine because the OS pathnames are encoded in UTF8. But, on older versions of Windows (pre Windows 7?) pathnames are encoded in the locale of the country (e.g. Shift-jis here in Japan). So, all Perl calls that return pathnames with non-ASCII characters get messed up.

The solution that I used was to find the locale encoding using Win32:Codepage and then encode that to UTF8 when reading files. Then, when writing (or updating) files, I would decode back to the locale encoding.

LozzerJP
  • 856
  • 1
  • 8
  • 23
  • 1
    You got one point incorrect in your answer above. Perl assumes 8-bit Latin-1 by default for encoding strings. Like Shift-JIS, Latin-1 is a superset of ASCII. – Chris Dolan Dec 23 '10 at 15:27
  • Sorry for the mistake above. Yep, it's Latin 1. The main point though is Perl's use of UTF8, which messes up path names on Windows systems. I still find it amazing that you cannot use a simple "open" command in Perl (on non-Latin 1 Windows systems) without messing about with Win32::Codepage. – LozzerJP Dec 27 '10 at 11:15
2

I had this same problem in a project a few years back (our PAR-packed GUI app had to work under Shift-JIS encoding). I tried LOTS of techniques to make Perl 5.8 do this right automatically. In the end, my tedious-but-effective solution was to encode EVERY filename just before passing it to the builtins.

First, set up the utility function:

use Encode;
use Win32::Codepage;
my $encoding = Win32::Codepage::get_encoding() || q{};
if ($encoding) {
    $encoding = Encode::resolve_alias($encoding) || q{};
}
sub encode_filename {
    my ($filename) = @_;
    return $encoding ? encode($encoding, $filename) : $filename;
}

Then, use it everywhere:

next if (! -d encode_filename($tmpldir));
my $file = SWF::File->new(encode_filename($dest));
@entries = File::Slurp::read_dir(encode_filename($srcdir));
etc...

I even wrote a little checker to make sure I used it everywhere!

egrep "\-[a-zA-Z] |open[^_]|[^ ]parse|unlink|symlink|mkdir[^_]|mkpath|rename[^\']|File::Copy::copy|rmtree|getTemplate[^D]|write_file|read_file|read_dir" *.pl `find lib -name '*.pm'` | grep -
v encode_filename | egrep -v '^[^:]+: *(\#|_announce|debug)'

If you miss even one, you'll get the "Wide-character" warning at runtime...

Chris Dolan
  • 8,905
  • 2
  • 35
  • 73
0

I discovered I had to disable UAC (User Access Control) on Microsoft Windows Vista before I could successfully install either Win32::Locale or Win32::Codepage. (Thank you, Chris Dolan, for writing the latter module.)

Jim Monty
  • 143
  • 2
  • 11
0

I also had problems with UAC (User Access Control) on Windows 7 and newer. I finally found out, that access to the required Registry key only has read permissions since WIndows Vista. You can easily patch Win32::Codepage to work without administrative privileges if you open the file in your favourite editor and replace:

  $codekey = Win32::TieRegistry->new($CODEPAGE_REGISTRY_KEY,
                                     { Delimiter => "/" }
                                    );

  $codekey = Win32::TieRegistry->new($CODEPAGE_REGISTRY_KEY,
                                     { Access=>"KEY_READ", Delimiter => "/" }
                                    );

This has helped on my installation.

Wolfi
  • 1
0

Perl's native functions cannot be used in this case. Use functions in Win32 module which support Unicode characters. Win32 was first released with perl v5.8.7.

Alan Haggai Alavi
  • 72,802
  • 19
  • 102
  • 127
  • Thank you for your advice. I changed the script to unless ( Win32::CopyFile($input, $ansi_path, 1) ) { my $err = $^E; if ( $err == ERROR_ALREADY_EXISTS ) { warn "Directory exists, no problem\n"; } else { die Win32::FormatMessage($^E); } } BUT IT DOESN'T WORK FOR GREEK!! ANY IDEA? – Richard Nov 20 '10 at 12:00
  • I am sorry. I do not have a Windows system to test on, presently. If I am able to test, I will update my answer. – Alan Haggai Alavi Nov 20 '10 at 15:06
  • Alan's answer is on the right track, but is incorrect. You *can* use Perl's native functions with a wide character encoding, but you must encode the filenames to the system encoding before handing the filenames to Perl. See my answer. – Chris Dolan Dec 19 '10 at 06:13