I'm currently learning the art of Unicode Programming, and applying it to a personal project. Soon I realized how it is really difficult to get it right, and even to understand if you did it correctly: if the tool is wrong, you can be mistaken in evaluating the results of your work.
My small goal in this exercise is to understand what I should pass to mkdir
versus what is good for File::Path::make_path
. In other words: what do they expect? Will they handle the encoding depending on the locale, or should I do it for them?
I wrote the following scripts, which takes arguments from @ARGV
and for each of them creates the directory $_
, using both functions and both encoded and decoded froms.
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
use v5.16;
use Encode;
use Encode::Locale;
use File::Path qw/make_path/;
use File::Spec;
# Everything under the './tree' directory
mkdir 'tree';
mkdir File::Spec->catdir('tree', $_)
for ('mkdir', 'mkdir_enc', 'make_path', 'make_path_enc');
foreach (map decode(locale => $_) => @ARGV) {
mkdir File::Spec->catdir('tree', 'mkdir', $_);
mkdir encode(locale_fs => File::Spec->catdir('tree', 'mkdir_enc', $_));
make_path(File::Spec->catdir('tree', 'make_path', $_));
make_path(encode(locale_fs => File::Spec->catdir('tree', 'make_path_enc', $_)));
}
I executed the script as follows:
./unicode_mkdir.pl a→b←c
What I would expect is:
- Either
tree/mkdir
[x]ortree/mkdir_enc
contain directories named gibberish; - Either
tree/make_path
[x]ortree/make_path_enc
contain directories named gibberish;
With great surprise I found out that all version work properly. I verified it with find
:
$ find tree
tree
tree/mkdir_enc
tree/mkdir_enc/a→b←c
tree/mkdir
tree/mkdir/a→b←c
tree/make_path_enc
tree/make_path_enc/a→b←c
tree/make_path
tree/make_path/a→b←c
I realized that the tree
command makes it so wrong… (a quite common disease) but at least I could see that the results are all the same:
$ tree tree
tree
├── make_path
│ └── a\342\206\222b\342\206\220c
├── make_path_enc
│ └── a\342\206\222b\342\206\220c
├── mkdir
│ └── a\342\206\222b\342\206\220c
└── mkdir_enc
└── a\342\206\222b\342\206\220c
8 directories, 0 files
A ls -R
command seems to confirm it.
$ ls -R tree
tree:
make_path make_path_enc mkdir mkdir_enc
tree/make_path:
a→b←c
tree/make_path/a→b←c:
tree/make_path_enc:
a→b←c
tree/make_path_enc/a→b←c:
tree/mkdir:
a→b←c
tree/mkdir/a→b←c:
tree/mkdir_enc:
a→b←c
tree/mkdir_enc/a→b←c:
So my questions are:
Am I doing it right code-wise ('course not)?
Am I doing it right filesystem-wise?
How can
mkdir
andmake_path
figure out and fix the wrong one?Or maybe I was just "reverse-lucky" (the kind of lucky which doesn't allow you to realize your error, since in your case it? In that case, how I can test it out effectively?
Any hint?