1

I am parsing a large EMBL file (>1G) and convert it to a gff file. It has some entries are not matching the traditional embl formation thus cause the bioperl module to throw exceptions. My question is since entries with error are only small portion of total sequences and I want to continue the script and just ignore the exception for now. But the perl script was always stoped by exceptions.

I am under a linux OS and with perl version 5.8.8

my perl script

use strict;
use Bio::SeqIO;
use Bio::Tools::GFF;
use warnings;
use Try::Tiny;

open (E ,">","emblError.txt");

if (@ARGV != 1) {    die "USAGE: embl2gff.pl   > outputfile.\n"; }

my $in = Bio::SeqIO->new(-file=>$ARGV[0],-format=>'EMBL');
eval {
   while (my $seq = $in->next_seq) {
      for my $feat ($seq->top_SeqFeatures) {
          my $gffio = Bio::Tools::GFF->new(-gff_version => 3);
          print $feat->gff_string($gffio)."\n";
        }
    }
};
if ($@) {
    warn "Oh no! [$@]\n";
}

The error I got

Name "main::E" used only once: possible typo at embl2GFF3.pl line 7.

--------------------- WARNING ---------------------
MSG: exception while parsing location line [join(9174..9343,14214..14303)complement(9268..9363),complement(9140..9198),complement(8965..9034),complement(8751..8884),complement(8419..8535),complement(8232..8337),complement(7952..8149),complement(7256..7332),complement(7051..7175),complement(6769..6877),complement(6601..6659),complement(4690..6530))] in reading EMBL/GenBank/SwissProt, ignoring feature mRNA (seqid=XcouVSXmac70forkSpecies.Scaffold1050.final):
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bad operator 1: had multiple locations 2, should be SplitLocationI
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:472
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:210
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:204
STACK: Bio::SeqIO::FTHelper::_generic_seqfeature /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/FTHelper.pm:133
STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/embl.pm:403
STACK: embl2GFF3.pl:14
-----------------------------------------------------------

---------------------------------------------------

--------------------- WARNING ---------------------
MSG: exception while parsing location line [join(14219..14303,14368..14513)complement(9140..9198),complement(8965..9034),complement(8751..8884),complement(8419..8535),complement(8232..8337),complement(7952..8149),complement(7256..7332),complement(7051..7175),complement(6769..6877),complement(6601..6659),complement(6461..6530))] in reading EMBL/GenBank/SwissProt, ignoring feature CDS (seqid=XcouVSXmac70forkSpecies.Scaffold1050.final):
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bad operator 1: had multiple locations 2, should be SplitLocationI
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:472
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:210
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:204
STACK: Bio::SeqIO::FTHelper::_generic_seqfeature /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/FTHelper.pm:133
STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/embl.pm:403
STACK: embl2GFF3.pl:14
-----------------------------------------------------------

---------------------------------------------------
Oh no! [Can't call method "isa" on an undefined value at /usr/lib/perl5/site_perl/5.8.8/Bio/Seq.pm line 1142, <GEN0> line 538764.
]

NOTE: I didn't post the exception twice, it just happen this way and only one exception seems to be caught .

Here is the block of embl file cause the problem. The mRNA entry causes the first exception and the CDS causes the second.

FT   mRNA            join(9174..9343,14214..14303)
FT                   complement(9268..9363),complement(9140..9198),
FT                   complement(8965..9034),complement(8751..8884),
FT                   complement(8419..8535),complement(8232..8337),
FT                   complement(7952..8149),complement(7256..7332),
FT                   complement(7051..7175),complement(6769..6877),
FT                   complement(6601..6659),complement(4690..6530))
FT                   /gene="ENSXMAG00000014948"
FT                   /note="transcript_id=ENSXMAT00000015030"
FT   CDS             join(14219..14303,14368..14513)
FT                   complement(9140..9198),complement(8965..9034),
FT                   complement(8751..8884),complement(8419..8535),
FT                   complement(8232..8337),complement(7952..8149),
FT                   complement(7256..7332),complement(7051..7175),
FT                   complement(6769..6877),complement(6601..6659),
FT                   complement(6461..6530))
FT                   /gene="ENSXMAG00000014948"
FT                   /protein_id="ENSXMAP00000015010"
FT                   /note="transcript_id=ENSXMAT00000015030"
FT                   /db_xref="HGNC_transcript_name:ENO3-201"

1 Answers1

6

eval doesn't catch low-level Perl errors. Also check for a $SIG{__DIE__} handler. If a die-handler was written inexpertly, it might just die. For example, if the handler does not check $EXCEPTIONS_BEING_CAUGHT, it might exit from a die handler.

But just looking at your output, if it printed this:

Oh no! [Can't call method "isa" on an undefined value at 
/usr/lib/perl5/site_perl/5.8.8/Bio/Seq.pm line 1142, line 538764. ]

Then, it's not doing what you said it was doing. Your eval is catching the error, or you wouldn't be able to print it with "Oh no!" in front. It looks like it's also doing some stack trace dumping on its own is all.

Finally, it looks like your program state is data-dependent and that some erroneous values in your files can put it in the wrong state. For whatever reason it could not create a BIO::Seq object and passed it to some function that checks to see if the argument isa something or other. It looks like the offending line in your input file is #538,764. But I could be wrong.

NOTE: to address your issue in the comments. If Bioperl is handling the errors it finds, and you just want to slog through a series of records, then my suggestion is that you put your eval inside the loop--either the while or the for loop. This is a pretty standard form for some multi-threaded applications.

 while ( 1 ) {
     eval { $me->spin(); 1; } or say "WARNING: $@";
     # unless we are officially done, just get ready to
     # handle somebody causing an exception in our thread.
     last if $me->done; 
 }

Remember to put the eval at the place where you want to recover processing, if possible.

Axeman
  • 29,660
  • 2
  • 47
  • 102
  • You are so fast. I have posted the script (I hit the wrong bottom previouly) and do you mind be a little more specific? Thank you. – user2241994 Apr 03 '13 at 19:45
  • It throws two exceptions at once and maybe only one is caught? If that is the case I probably should change the title. – user2241994 Apr 03 '13 at 20:08
  • @user2241994, it could throw 20 more errors behind the scenes that it catches and recovers from, but you only see one error, printed in your `"Oh no! [$@]\n"` format. So it is being caught. The Bioperl modules are just helping you out, verbosely. Do you have bleeding-edge Bioperl--or a module? If not, I can't help but feel that you should find out what that input line should look like. – Axeman Apr 03 '13 at 20:14
  • Thank you. The seqIO and GFF modules are quite standard module. The embl file is generate by RATT (rapid annotation transfer tool) program and I know could be pretty buggy. I will do something to fix the bugs eventually but now I know >95% sequences are correct and I want to get these sequences so folks in the lab will have a project to work on. My question is how to ignore errors (no matter how many) and continue. – user2241994 Apr 03 '13 at 20:26
  • @user2241994, my guess is put your `eval` and handler *inside* the `while`--or `for` loop. That way it spews errors and keeps on chugging. – Axeman Apr 03 '13 at 20:38
  • I tried to put eval between while and for or within for loop. I even tried put two evals but none of them working. I guess I might have to hardwire the bioperl module to bypass it. Thank you for your previous comments and suggestions. – user2241994 Apr 03 '13 at 20:52
  • @user2241994, well you can't always know how to recover from an exception. :( – Axeman Apr 03 '13 at 21:04