I wrote a PERL program which takes an excel sheet (coverted to a text file by changing the extension from .xls to .txt) and a sequence file for its input. The excel sheet contains the start point and the end point of an area in the sequence file (along with 70 flanking values on either side of the match area) that needs to cut and extracted into a third output file. There are like 300 values. The program reads in the start point and the end point of the sequence that needs to be cut each time but it repeatedly tells me that the value is outside the length on the input file when it clearly isn't. I just cant seem to get this fixed
This is the program
use strict;
use warnings;
my $blast;
my $i;
my $idline;
my $sequence;
print "Enter Your BLAST result file name:\t";
chomp( $blast = <STDIN> ); # BLAST result file name
print "\n";
my $database;
print "Enter Your Gene list file name:\t";
chomp( $database = <STDIN> ); # sequence file
print "\n";
open IN, "$blast" or die "Can not open file $blast: $!";
my @ids = ();
my @seq_start = ();
my @seq_end = ();
while (<IN>) {
#spliting the result file based on each tab
my @feilds = split( "\t", $_ );
push( @ids, $feilds[0] ); #copying the name of sequence
#coping the 6th tab value of the result which is the start point of from where a value should be cut.
push( @seq_start, $feilds[6] );
#coping the 7th tab value of the result file which is the end point of a value should be cut.
push( @seq_end, $feilds[7] );
}
close IN;
open OUT, ">Result.fasta" or die "Can not open file $database: $!";
for ( $i = 0; $i <= $#ids; $i++ ) {
($sequence) = &block( $ids[$i] );
( $idline, $sequence ) = split( "\n", $sequence );
#extracting the sequence from the start point to the end point
my $seqlen = $seq_end[$i] - $seq_start[$i] - 1;
my $Nucleotides = substr( $sequence, $seq_start[$i], $seqlen ); #storing the extracted substring into $sequence
$Nucleotides =~ s/(.{1,60})/$1\n/gs;
print OUT "$idline\n";
print OUT "$Nucleotides\n";
}
print "\nExtraction Completed...";
sub block {
#block for id storage which is the first tab in the Blast output file.
my $id1 = shift;
print "$id1\n";
my $start = ();
open IN3, "$database" or die "Can not open file $database: $!";
my $blockseq = "";
while (<IN3>) {
if ( ( $_ =~ /^>/ ) && ($start) ) {
last;
}
if ( ( $_ !~ /^>/ ) && ($start) ) {
chomp;
$blockseq .= $_;
}
if (/^>$id1/) {
my $start = $. - 1;
my $blockseq .= $_;
}
}
close IN3;
return ($blockseq);
}
BLAST RESULT FILE: http://www.fileswap.com/dl/Ws7ehftejp/
SEQUENCE FILE: http://www.fileswap.com/dl/lPwuGh2oKM/
Error
substr outside of string at Nucleotide_Extractor.pl line 39.
Use of uninitialized value $Nucleotides in substitution (s///) at Nucleotide_Extractor.pl line 41.
Use of uninitialized value $Nucleotides in concatenation (.) or string at Nucleotide_Extractor.pl line 44.
Any help is very much appreciated and queries are always invited