-1

I have a file which looks like this:

80,1p21
81,19q13
82,6p12.3
83,Xp11.22
84,3pter-q21
86,3q26.33
87,14q24.1-q24.2|14q24|14q22-q24
88,1q42-q43
89,11q13.1
90,2q23-q24
91,12q13
92,2q22.3
93,3p22
94,12q11-q14
95,3p21.1
97,14q24.3
98,2p16.2

And I want to sort them based on the second column. And the first column should change accordingly too. When you use the 'sort' command in Perl, it doesn't do it because it says it's not numeric. Is there a way to sort things alpha numerically in Perl?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jordan
  • 311
  • 1
  • 4
  • 11
  • 1
    Also, if you have them in a file already, you might try the unix sort command. A perl script to sort a file is overkill. – Mark Tozzi Jun 19 '12 at 14:08
  • This question was asked here a LOT of times. And it's quite well covered by documentation. Now let me ask you: aren't you saying that ignoring search is the right thing to do for beginners? In my opinion, it's quite the opposite actually: it's even more important for beginner to start using search ASAP. – raina77ow Jun 19 '12 at 16:30
  • 1
    @lanZZ: the sort documentation from perl didn't help much. If I use the lexical sort, it sorts 1p21 19q13 6p12.3 11q13.1 2q23-q24 as 11q13.1 19q13 1p21 2q23-q24 6p12.3. Which is not what I need. – Jordan Jun 19 '12 at 17:15
  • @MarkTozzi: I tried the unix sort too. It seems to have the same problem as perl sort. – Jordan Jun 19 '12 at 17:16
  • @Anish: You are not restricted to doing lexical sort in your custom function. Your custom function receives two full records that need to be compared, e.g. `89,11q13.1` and `98,2p16.2`. You will need to implement the code to extract the second column from each of the two records, and compare these second columns. If even this is too hard, then you have more fundamental skills to improve on first, and it is beyond the scope of this site and our help to develop these skills in you. – lanzz Jun 19 '12 at 20:24
  • @LenJaffe: I specifically avoid giving ready-to-use solutions as answers to any beginner questions. Beginners do not benefit from being given their prize on a platter, even if accompanied with a detailed explanation. Beginner questions often require non-trivial understanding of the matter, which will not be acquired by just copying a ready answer from a site. – lanzz Jun 19 '12 at 20:27
  • @lanzz: I see what you mean. I tried it myself and got the answer. – Jordan Jun 20 '12 at 19:15
  • 3
    @Ianzz: So you're the SO police enforcing an "Ask non trivial questions or RTFM" policy? Use SO to teach more than "Go get lost in the 5000 pages of perl docs." Teach, "Here's how to find that in the 5000 pages of docs", "Here's other relevant things to consider while choosing the right answer for your question". Teach fishing skills, don't just tell them to go fish. – Len Jaffe Jun 21 '12 at 16:46

4 Answers4

4

If you read the documentation for sort, you'll see that you don't need to do a numeric sort in Perl. You can do string comparisons too.

@sorted = sort { $a cmp $b } @unsorted;

But that still leaves you with a problem as, for example, 19q will sort before 6p. So you can write your own sort function which can make whatever transformations you want before doing the comparison.

@sorted = sort my_complex_sort @unsorted;

sub my_complex_sort {
  # code that compares $a and $b and returns -1, 0 or 1 as appropriate
  # It's probably best in most cases to do the actual comparison using cmp or <=>

  # Extract the digits following the first comma
  my ($number_a) = $a =~ /,(\d+)/;
  my ($number_b) = $b =~ /,(\d+)/;

  # Extract the letter following those digits
  my ($letter_a) = $a =~ /,\d+(a-z)/;
  my ($letter_b) = $b =~ /,\d+(a-z)/;

  # Compare and return
  return $number_a <=> $number_b or $letter_a cmp $letter_b;
}
Zaid
  • 36,680
  • 16
  • 86
  • 155
Dave Cross
  • 68,119
  • 3
  • 51
  • 97
  • use Sort::Versions; another way to do it. @allFileArraySorted = sort { versioncmp($a,$b) } @allFileArray; – Sam B Aug 27 '19 at 19:44
0
#!/usr/bin/env perl

use strict;
use warnings;

my @datas   = map { /^(\d+),(\d*)(.*)$/; [$1, $2, $3]; } <DATA>;
my @res     = sort {$a->[1] <=> $b->[1] or $a->[2] cmp $b->[2]} @datas;
foreach my $data (@res) {
    my ($x, $y, $z) = @{$data};
    print "$x,$y$z\n";
}

__DATA__
80,1p21
81,19q13
82,6p12.3
83,Xp11.22
84,3pter-q21
86,3q26.33
87,14q24.1-q24.2|14q24|14q22-q24
88,1q42-q43
89,11q13.1
90,2q23-q24
91,12q13
92,2q22.3
93,3p22
94,12q11-q14
95,3p21.1
97,14q24.3
98,2p16.2 
cdtits
  • 19
  • 2
0

I actually found the answer to this. The code looks a bit complicated though.

#!/usr/bin/env perl

use strict;  
use warnings;

sub main {   
my $file;  
if (@ARGV != 1) {   
    die "Usage: perl hashofhash_sort.pl <filename>\n";
}   
else {  
    $file = $ARGV[0];   
}  

open(IN, $file) or die "Error!! Cannot open the $file file: $!\n";
my @file = <IN>;
chomp @file;
my ($entrez_gene, $loci, $chr, $band, $pq, $band_num);
my (%chromosome, %loci_entrez);

foreach my $line (@file) {
    if ($line =~ /(\d+),(.+)/) {
        # Entrez genes
        $entrez_gene = $1;

        # Locus like 12p23.4
        $loci = $2;

        if ($loci =~ /^(\d+)(.+)?/) {
            # chromosome number alone (only numericals)
            $chr = $1;
            if ($2) {
                # locus minus chromosome number. If 12p23.4, then $band is p23.4
                $band = "$2";
                if ($band =~ /^([pq])(.+)/) {
                    # either p or q
                    $pq = $1;
                    # stores the numericals. for p23.4, stores 23.4
                    $band_num = $2;
                }

                if (exists $chromosome{$chr}) {
                    if (exists $chromosome{$chr}{$pq}) {
                        push (@{$chromosome{$chr}{$pq}}, $band_num);
                    }
                    else {
                        $chromosome{$chr}{$pq} = [$band_num];
                    }
                }

                else {
                    $chromosome{$chr}{$pq} = [$band_num];
                }
            }
        }
    }
} # End of foreach loop

foreach my $key (sort {$a <=> $b} keys %chromosome) {
    my %seen = ();
    foreach my $key2 (sort {$a cmp $b } keys %{$chromosome{$key}}) {
        my @unique = grep { ! $seen{$_}++ } @{$chromosome{$key}{$key2}};
        my @sorted = sort @unique;
        foreach my $element (@sorted) {
            my $sorted_locus = "$key$key2$element";
            if (exists $loci_entrez{$sorted_locus}) {
                foreach my $element2 (@{$loci_entrez{$sorted_locus}}) {
                        print "$element2,$sorted_locus\n";

                }
            }
        }
    }

}


} # End of main

main();
Jordan
  • 311
  • 1
  • 4
  • 11
-1

In the very general case, the question is ambiguous on what to do with integers that are equal but written differently, because of the possibility of leading zeros. The following comparison function (for sort) allows one to consider the lexicographic order as soon as one doesn't have different integers. This is the same as zsh's numeric sort.

sub alphanumcmp ($$)
  {
    my (@u,@v);
    if ((@u = $_[0] =~ /^(\d+)/) &&
        (@v = $_[1] =~ /^(\d+)/))
      {
        my $c = $u[0] <=> $v[0];
        return $c if $c;
      }
    if ((@u = $_[0] =~ /^(.)(.*)/) &&
        (@v = $_[1] =~ /^(.)(.*)/))
      {
        return $u[0] cmp $v[0] || &alphanumcmp($u[1],$v[1]);
      }
    return $_[0] cmp $_[1];
  }

For instance, one would get the following sorted elements:

a0. a00. a000b a00b a0b a001b a01. a01b a1. a1b a010b a10b a011b a11b

Note 1: The use of <=> assumes that the numbers are not too large.

Note 2: In the question, the user wants to do an alphanumeric sort on the second column (instead of the whole string). So, in this particular case, the comparison function could just be adapted to ignore the first column or a Schwartzian transform could be used.

vinc17
  • 2,829
  • 17
  • 23