I have some very bizarre behavior in a script that I wrote and have used for years but, for some reason, fails to run on one particular file.
Recognizing that the script is failing to identify a key that should be in a hash, I added some test print statements to read the keys. My normal strategy involves placing asterisks before and after the variable to detect potential hidden characters. Clearly, the keys are corrupt. Relevant code block:
foreach my $fastaRecord (@GenomeList) {
my ($ID, $Seq) = split(/\n/, $fastaRecord, 2);
# uncomment next line to strip everything off sequence
# header except trailing numeric identifiers
# $ID =~ s/.+?(\d+$)/$1/;
$Seq =~ s/[^A-Za-z-]//g; # remove any kind of new line characters
$RefSeqLen = length($Seq);
$GenomeLenHash{$ID} = $RefSeqLen;
print "$ID\n";
print "*$ID**\n";
}
This produces the following output:
supercont3
**upercont3
Mitochondrion
**itochondrion
Chr1
**hr1
Chr2
**hr2
Chr3
**hr3
Chr4
**hr4
Normally, I'd suspect "illegal" newline characters as being involved. However, I manually replaced all newlines in the input file to try and solve the problem. What in the input file could be causing the script to execute in this way? I could imagine that maybe, despite my efforts, there is still an illegal newline after the ID variable, but then why are neither the first asterisk, nor newline characters after the double asterisk not printed, and why is the double asterisk printed at the beginning of the line in a way that overwrites the first asterisk as well as the first two characters of the variable "value"?