1

I have SNP data and gen list data. I am looking for the position of SNP cotain in the gen list data when I compare with gen list. For example:

  1. The SNP data :

    Pos_start pos_end 
    14185     14185      
    ....      .....   
    
  2. The gen list data:

    5"side(pos_start)  3"sile(pos_end)
      1                  1527      
      1920               1777 
      ....               ..... 
    
  3. the result: in the position 14185 of SNP contain at the 16185 position of the gen list.

Below is my code but it has some problem in sort the number.

   #!/usr/bin/perl -w

   open(POS1,"<posi1.txt"); (I collect two data and save with posi1.txt) 
    @posi1=<POS1>;
   open(list,">list.txt");
   @list1=@posi1;
   @list2= sort num_last (@list1);
   $list2 = join( '', @list2);

   print $list2;
   print list $list2."\n\n";
   close(list);
  sub num_last {
my ($num_a, $num_b);
$num_a=$a=~ /^[0-9]/;
$num_b=$b=~ /^[0-9]/;
if ($num_a && $num_b){
    return $a<=>$b;
} elsif ($num_a){
    return 1;
} elsif ($num_b){
    return -1;
} else {
    return $a cmp $b;
}
      }

I would appreciate if you could give some pointers.

Phan
  • 47
  • 1
  • 6
  • The question is unclear. I can't tell the answer to `in the position 14185 of SNP contain at the (??) position of the gen list.` from the example, so I can't understand the desired result. Could you please further explain what do you expect. We are no bionformatics (at least not everyone), so you might need to explain some obvious things – J0HN Aug 19 '11 at 06:47
  • at the position start of SNP 14185 I don"t know it contain in the gen list or not so I looking for that position containd in the gen list and the position at the gen. for example: in position of SNP 14185 is T. – Phan Aug 19 '11 at 07:39
  • That clarifies nothing. Ask right question, get right answer. The question is not `right` now. What's SNP? How `SNP` data is connected to `gen` list in your sample? What `genotype`, `reference` mean? `initiation condon` is the [start codon](http://en.wikipedia.org/wiki/Start_codon) of the gene, I presume? Does `5"side` and `3"side` relevant to the task? – J0HN Aug 19 '11 at 07:45
  • SNP:A single-nucleotide polymorphism (SNP, pronounced snip). genotype and refrence is not important in this problem. For example, two sequenced DNA fragments from different individuals, AAGCCTA to AAGCTTA, contain a difference in a single nucleotide. 5"side and 3"side same with Pos_start pos_end.And my problem is with the positions of SNP where the position of the gen. – Phan Aug 19 '11 at 08:05
  • Could you please remove irrelevant information from your question, it's still hard to tell what you are looking for. – J0HN Aug 19 '11 at 08:08

1 Answers1

0

First of all, your sort sub does not operate on values you pass. It should be something like

sub num_last {
    my ($num_a, $num_b);
    my ($a,$b) = @_;
    ....
}

Than, you are really getting only first digit in a string if the string starts from digit. It's better add skipping all leading whitespaces, just in case.

($num_a) = $a =~ /^\s*(\d+)/;
($num_b) = $b =~ /^\s*(\d+)/;

\d+ is equivalent to [0-9]+, but two chars shorter :). Braces force list context so, $num_a and $num_b receives content of first matched group: (\d+).

Than, you don't need <=> opertor, as $num_a and $num_b should be strings, so you can simplify your condition to:

if (!$num_a)
    return -1;
if (!$num_b)   
    return 1;
return $a cmp $b;

Not sure, but it might be as simple as return $a cmp $b, but I'm not sure if empty var is stringwise lesser than non-empty string and no perl at fingertips. So, final num_last function:

sub num_last{
    my ($num_a, $num_b);
    my ($a,$b) = @_;

    ($num_a) = $a =~ /^\s*(\d+)/;
    ($num_b) = $b =~ /^\s*(\d+)/;

    if (!$num_a)
        return -1;
    if (!$num_b)   
        return 1;
    return $a cmp $b;
}

If you need reverse sort, just replace my ($a,$b) = @_; with my ($b,$a) = @_;

And, I've written it without any compiler help, so there might be some minor errors in it.

J0HN
  • 26,063
  • 5
  • 54
  • 85
  • Subroutines used in conjunction with `sort` do not need to unpack `@_`. They can work directly with the package global variables `$a` and `$b`. See http://perldoc.perl.org/functions/sort.html. Also your advice not to use `<=>` won't work well if the OP actually requires a numeric comparison. – FMc Aug 19 '11 at 11:55
  • thank you very much. I had sort the data.Now, I have problem with the operation in Perl. I want to write the operarion of excel: =MOD((AH4-AG4),3)+1 (the value of AH4=14185, AG4=13628). Could you show me how to write it with Perl? – Phan Aug 22 '11 at 02:14
  • I assume `MOD` is a modulo opearator. `my $a=14185; my $b=13628); my $rslt = (($a-$b)%3)+1`. Take a look at [perlop](http://perldoc.perl.org/perlop.html) – J0HN Aug 22 '11 at 06:29