I've tested my program on a dozen Windows machines, a half dozen Macs, and a Linux machine and it works without error on both the Windows and Linux but not the Macs. My program is designed to work with protein database files which are text files that range from 250MB to 10GB. I took 1/10th of the 250MB file to make a sample file for debugging purposes but found that the error did not occur with the smaller file.
I've narrowed down the bug to this section of code, in this section $tempFile
, is the protein database file:
open(ps_file, "..".$slash."dataset".$slash.$tempFile)
or die "couldn't open $tempFile";
while(<ps_file>){
chomp;
my @curLine = split(/\t/, $_);
my $filter = 1;
if($taxon){
chomp($curLine[2]);
print "line2 ".$curLine[2].",\t".$taxR{$curLine[2]}."\n";
$filter = $taxR{$curLine[2]};
}
if($filter){
checkSeq(@curLine);
}
}
This is a screenshot of the output of that print statement showing special characters:
This is what the output looks like on a Windows Machine:
Here is an example of 1 line from the $tempFile
>sp|P48255|ABCX_CYAPA Probable ATP-dependent transporter ycf16 OS=Cyanophora paradoxa GN=ycf16 PE=3 SV=1 MSTEKTKILEVKNLKAQVDGTEILKGVNLTINSGEIHAIMGPNGSGKSTFSKILAGHPAYQVTGGEILFKNKNLLELEPEERARAGVFLAFQYPIEIAGVSNIDFLRLAYNNRRKEEGLTELDPLTFYSIVKEKLNVVKMDPHFLNRNVNEGFSGGEKKRNEILQMALLNPSLAILDETDSGLDIDALRIVAEGVNQLSNKENSIILITHYQRLLDYIVPDYIHVMQNGRILKTGGAELAKELEIKGYDWLNELEMVKK CYAPA