7

I'm trying to sort lists of names with Perl with a specific letter order to perform some special features.
The sorting would be working the same way as sort { $a cmp $b } but with a different succession of letters.
For example, ordering with the arbitrary character order "abdrtwsuiopqe987654" ...

I tried to deal with sort { $a myFunction $b } but I'm newbie with Perl and I don't see how to organize correctly myFunction to get what I want.

  • Is there a specific function (a package) which provide this functionnality?
  • Do you have an example of a custom sorting function dealing with strings ?
  • Do you know how (or in which source file) is the cmp function implemented with Perl to see how it works ?
JeanJouX
  • 2,555
  • 1
  • 25
  • 37

2 Answers2

11

The following is probably the fastest[1]:

sub my_compare($$) {
    $_[0] =~ tr{abdrtwsuiopqe987654}{abcdefghijklmnopqrs}r
       cmp
    $_[1] =~ tr{abdrtwsuiopqe987654}{abcdefghijklmnopqrs}r
}

my @sorted = sort my_compare @unsorted;

Or if you want something more dynamic, the following might be the fastest[2]:

my @syms = split //, 'abdrtwsuiopqe987654';
my @map; $map[ord($syms[$_])] = $_ for 0..$#syms;

sub my_compare($$) {
    (pack 'C*', map $map[ord($_)], unpack 'C*', $_[0])
       cmp
    (pack 'C*', map $map[ord($_)], unpack 'C*', $_[1])
}

my @sorted = sort my_compare @unsorted;

We could compare character by character, but that will be far slower.

use List::Util qw( min );

my @syms = split //, 'abdrtwsuiopqe987654';
my @map; $map[ord($syms[$_])] = $_ for 0..$#syms;

sub my_compare($$) {
    my $l0 = length($_[0]);
    my $l1 = length($_[1]);
    for (0..min($l0, $l1)) {
       my $ch0 = $map[ord(substr($_[0], $_, 1))];
       my $ch1 = $map[ord(substr($_[1], $_, 1))];
       return -1 if $ch0 < $ch1;
       return +1 if $ch0 > $ch1;
    }

    return -1 if $l0 < $l1;
    return +1 if $l0 > $l1;
    return 0;
}

my @sorted = sort my_compare @unsorted;

  1. Technically, it can be made faster using GRT.

     my @sorted =
        map /\0(.*)/s,
        sort
        map { tr{abdrtwsuiopqe987654}{abcdefghijklmnopqrs}r . "\0" . $_ }
        @unsorted;
    
  2. Technically, it can be made faster using GRT.

     my @sorted =
        map /\0(.*)/s,
        sort
        map { ( pack 'C*', map $map[ord($_)], unpack 'C*', $_ ) . "\0" . $_ }
        @unsorted;
    

cmp is implemented by the scmp operator.

$ perl -MO=Concise,-exec -e'$x cmp $y'
1  <0> enter
2  <;> nextstate(main 1 -e:1) v:{
3  <#> gvsv[*x] s
4  <#> gvsv[*y] s
5  <2> scmp[t3] vK/2
6  <@> leave[1 ref] vKP/REFC

The scmp operator is implemented by the pp_scmp function in pp.c, which is really just a wrapper for sv_cmp_flags in sv.c when use locale; isn't in effect. sv_cmp_flags either uses C library function memcmp or a UTF-8 aware version (depending on the type of scalar).

ikegami
  • 367,544
  • 15
  • 269
  • 518
1
use Sort::Key qw(keysort);
my @sorted = keysort { tr/abdrtwsuiopqe987654/abcdefghijklmnopqrs/r } @data;

Or in older perls not supporting the r flag in tr/.../.../r

my @sorted = keysort { my $key = $_;
                       $key =~ tr/abdrtwsuiopqe987654/abcdefghijklmnopqrs/;
                       $key } @data;

You can also create an specialized sort subroutine for that kind of data as follows:

use Sort::Key::Maker 'my_special_sort',
                     sub { tr/abdrtwsuiopqe987654/abcdefghijklmnopqrs/r },
                     qw(string);

my @sorted = my_special_sort @data;
my @sorted2 = my_special_sort @data2;
salva
  • 9,943
  • 4
  • 29
  • 57