-2

I am trying to analyze several thousand sequences for double restriction enzyme cuts. For this example, let's say these two enzymes are EcoRI and MspI. I am using Bioperl modules for this (see the code below) and can make it work for single enzymes. It's just that the two enzymes working altogether are giving either weird results, or not working as expected.

#!/usr/bin/perl

use strict;
use Bio::Seq;
use Bio::SeqIO;
use Bio::Restriction::Analysis;
use Bio::Restriction::EnzymeCollection;
use List::Util qw(min max sum);

my $usage = "perl $0 <in><out>";
my $in = shift or die $usage;
my $out = shift or die $usage;

my ($ra, $seq);
my $enz_collection = Bio::Restriction::EnzymeCollection->new();
my (@ecorbp, @ecorsize, @mspbp, @mspsize);

my $seq_in = Bio::SeqIO->new( 
                            -format => 'fasta', 
                            -file => $in,
                            );
open OUT, ">$out";
while ($seq = $seq_in -> next_seq) 
{
$ra = Bio::Restriction::Analysis->new(-seq=>$seq);

#how many times does each enzyme cut
my $EcoRIcuts = $ra->cuts_by_enzyme('EcoRI');
my $MspIcuts = $ra->cuts_by_enzyme('MspI');
my $bothcuts = $ra->cuts_by_enzyme('EcoRI', 'MspI');
#print $bothcuts."\n";

#doing something with bp
@ecorbp = $ra->positions('EcoRI');
@mspbp = $ra->positions('MspI');
my $ecorbp_min = min(@ecorbp); 
my $ecorbp_max = max(@ecorbp); 
my $mspbp_min = min(@mspbp); 
my $mspbp_max = max(@mspbp); 
my $ecorbp_avg = sum(@ecorbp)/@ecorbp;
my $mspbp_avg = sum(@mspbp)/@mspbp;

#doing something with cut size/fragment
@ecorsize = $ra->sizes('EcoRI');
@mspsize = $ra->sizes('MspI');
my $ecorsize_min = min(@ecorsize); 
my $ecorsize_max = max(@ecorsize); 
my $mspsize_min = min(@mspsize); 
my $mspsize_max = max(@mspsize); 
my $ecorsize_avg = sum(@ecorsize)/@ecorsize;
my $mspsize_avg = sum(@mspsize)/@mspsize;

my @bothsize = $ra->sizes('EcoRI','MspI');
my $bothsize_max = max (@bothsize);
my $bothsize_min = min (@bothsize);
print $bothsize_max,"\n";
print $bothsize_min,"\n";

#do similar thing with the cut positions; this part not written yet
@bothbp = $ra->positions('EcoRI','MspI');

#print OUT "EcoRI cuts ",$seq -> display_id," $EcoRIcuts times with at minimum base position of $ecorbp_min, maximum base position of $ecorbp_max with average base position of $ecorbp_avg to give minimum frament size of $ecorsize_min bp, maximum fragment size of $ecorsize_max bp with average fragment size of $ecorsize_avg bp\n";
#print OUT "MspI cuts ",$seq -> display_id," $MspIcuts times with at minimum base position of $mspbp_min, maximum base position of $mspbp_max with average base position of $mspbp_avg to give minimum frament size of $mspsize_min bp, maximum fragment size of $mspsize_max bp with average fragment size of $mspsize_avg bp\n";
}

close OUT;

The problem comes when the code reaches the 'bothsize' part... printing numbers in decimals for some weird reason.

Here's my very short input file that I am using for testing the code:

>Seq1
AAAAGAATTCAAAAAAAAGAAAACAATAAGTAAAAGACAGACCGGCAGAGAAAACTTACCCGAC
>Seq2
AAACGAGAATTCAAAAAATAGAAAACCGGAATAAGTAAAAGACAGGAATTCAGCAGCCGGAGAA

The code is expected to report the position where both enzymes cut (similar to ecorbp, mspbp) & also the cut fragment size (similar to ecorsize and mspsize).

I would appreciate any help or pointers on how I can code this right. Thanks.

berge2015
  • 27
  • 4
  • What is the expected behavior? – Matt Jacob Sep 23 '16 at 23:23
  • It's supposed to report the cut position & also the cut fragment size. Also edited the question above to clarify. – berge2015 Sep 23 '16 at 23:35
  • What you do with your own code is up to you. But if you're asking for free help then it's rather crude to post a block of Perl that isn't indented to make it readable. – Borodin Sep 24 '16 at 00:22
  • @Borodin why so salty bro? i am obviously not seeking free help... otherwise wouldn't be showing my code to learn where i am making mistakes. – berge2015 Sep 24 '16 at 00:26
  • @berge2015: I don't understand. Are you saying that you intend to pay us? I'm just saying that it seems unfair to ask for help when you can't be bothered to lay out your code properly. It's also quite likely that you would have seen the problem for yourself if you had made it readable. – Borodin Sep 24 '16 at 00:30
  • @Borodin well, if it isn't readable to you, then you shouldn't spend your good evening here sir. – berge2015 Sep 24 '16 at 00:31
  • It seems quite likely that the `sizes` method returns numeric sizes. That means all the elements of `@bothsize` are numbers, so I don't understand why you're surprised to get decimal output. – Borodin Sep 24 '16 at 00:33
  • 1
    You think we should suck up your poor layout as well as trying to help you? No. You should have the good manners to write readable code when you are asking for help. Most people abide by [`perlstyle`](http://perldoc.perl.org/perlstyle.html) – Borodin Sep 24 '16 at 00:37
  • @Borodin Thanks for pointing that out but I guess you are not familiar with the biology of restriction enzymes and base pairs... they are not in decimal figures. Besides, as I explained clearly, the method (including `sizes`) works for individual enzymes. The trouble is when both enzymes are used. – berge2015 Sep 24 '16 at 00:39
  • @berge2015: It's probably best if you read [the documentation](http://search.cpan.org/dist/BioPerl/Bio/Restriction/Analysis.pm) for the software you're using instead of coming over all entitled. – Borodin Sep 24 '16 at 00:54

1 Answers1

2

The sizesmethod of a Bio::Restriction::Analysis object doesn't accept multiple enzymes

The second parameter is "kilobases to the nearest 0.1 kb"

Borodin
  • 126,100
  • 9
  • 70
  • 144
  • 1
    @ber A look [at the code](https://metacpan.org/source/CJFIELDS/BioPerl-1.6.924/Bio/Restriction/Analysis.pm#L618) makes it even more clear. `'MspI'` is passed as the second argument. That's a true value, so it does a bit of math to convert to the nearest 0.1 kb, whatever that means. All you need to do is call that function twice. It doesn't take domain knowledge to read a few simple lines of code. But it does help to include relevant domain knowledge in the question know more about the tools you use can help you better. If your chainsaw broke, you'd tell them how big the tree was, wouldn't you? – simbabque Sep 24 '16 at 04:06