I'm working on a project that's scalating a lot, lately, and I'm re-writing code to make it more OOP and passing all redundant code into sub-routines.
The script checks whether a gene exists in the database (through various means) or not. It may also report possible duplicates. Before reporting a duplicate, the script makes sure it's not a "biological duplicate" (essentially the same biological data but a with different position in the genome and, hence, not an actual duplicate). In order to do so...
my @gene_ids;
my @gene_names;
while(my $gene = $geners_bychecksum->next){
my $gene_name = $gene->gene_name;
my $gene_id = $gene->gene_id;
push @gene_ids, $gene_id;
push @gene_names, $gene_name;
}
print STDERR "$id\tJ\tALERT CHECKSUM MULTI-HIT\t(".join(",",@gene_names).")\n";
my $solve_multihit = solve_multihit($id, \@gene_names, \@gene_ids, $spc, $species_directory, $dataset);
print STDERR "$id\tJ\tALERT CHECKSUM MULTI-HIT\t(".join(",",@gene_names).")\n";
if($solve_multihit){
print STDERR "$id\tM\tUPDATE \n";
print $report "$id\tM\tUPDATE \n";
$countM++;
} else {
print STDERR "$id\tJ\tALERT CHECKSUM MULTI-HIT\t(".join(",",@gene_names).")\n";
}
Here, $geners_bychecksum
is a DBIC resulset with database hits from a prior search and, for this case-scenario, it always has more than 1 gene. The $id
,$spc
,$species_directory
and $dataset
are all strings that come from the config and are defined above this chunk.
The solve_multihit
subroutine is a rather complicated function that tries to resolve whether the multi-hits are actual duplicates or biological duplicates. Notice that I'm passing the @gene_names
and @gene_ids
arrays to this function. This function will return the gene_id of the proper gene, if it was able to solve the discrepancy; or 0 if not. Simplified code for the sub can be found in the following link
THE ACTUAL QUESTION
You may have noticed that the
print STDERR "$id\tJ\tALERT CHECKSUM MULTI HIT\t(".join(",",@gene_names).")\n";
is both before and after the solve_multihit
subroutine call... and the array seems to go empty after running the function, according to the STDERR:
BBOV_I005030 J ALERT CHECKSUM MULTI-HIT (XP_001609152.1,XP_001609157.1)
BBOV_I005030 J ALERT CHECKSUM MULTI-HIT ()
BBOV_I005040 J ALERT CHECKSUM MULTI-HIT (XP_001609156.1,XP_001609153.1)
BBOV_I005040 J ALERT CHECKSUM MULTI-HIT ()
BBOV_I005050 J ALERT CHECKSUM MULTI-HIT (XP_001609154.1,XP_001609155.1)
BBOV_I005050 J ALERT CHECKSUM MULTI-HIT ()
BBOV_I005060 J ALERT CHECKSUM MULTI-HIT (XP_001609154.1,XP_001609155.1)
BBOV_I005060 J ALERT CHECKSUM MULTI-HIT ()
BBOV_I005070 J ALERT CHECKSUM MULTI-HIT (XP_001609156.1,XP_001609153.1)
BBOV_I005070 J ALERT CHECKSUM MULTI-HIT ()
BBOV_I005080 J ALERT CHECKSUM MULTI-HIT (XP_001609152.1,XP_001609157.1)
BBOV_I005080 J ALERT CHECKSUM MULTI-HIT ()
Why would that happen? I'm pretty sure I could solve it by returning the arrays along with the results of the solve_multihit{}
sub, but I wonder why would it go empty.
PS: The J in the report is just a case-scenario key code.