1

I've been using the Bio::DB::EntrezGene module from BioPerl to retrieve Entrez gene names given the numerical ID.

This worked fine for months, and as recently as two weeks ago. Recently, though, it only returns an error.

The strangest thing (to me) is that this happens even if I just run the sample code from the documentation. For example, if I run this:

#!/usr/bin/perl

use strict;
use warnings;
use Bio::DB::EntrezGene;

my $db = Bio::DB::EntrezGene->new;

my $seqio = $db->get_Stream_by_id([2, 4693, 3064]); # Gene ids
    while ( my $seq = $seqio->next_seq ) {
            print "id is ", $seq->display_id, "\n";
    }

exit;

I get this:

Replacement list is longer than search list at /Library/Perl/5.12/Bio/Range.pm line 251.
UNIVERSAL->import is deprecated and will be removed in a future perl at /Library/Perl/5.12/Bio/Tree/TreeFunctionsI.pm line 94
Data Error: none conforming data found on line 1 in /var/folders/2f/55z0d46n3l10bq650j6svgw89rmqw1/T/mkguvw1MOO/VR86iPUDSJ!
first 20 (or till end of input) characters including the non-conforming data:
::= {
  {
    track-
 at /Library/Perl/5.12/Bio/SeqIO/entrezgene.pm line 171

If anyone has any ideas about what might be going on and how to fix it, it'd be much appreciated. Thanks!

Community
  • 1
  • 1
Matt LaFave
  • 569
  • 6
  • 17

1 Answers1

0

You appear to have an error in the data on which this module depends. What is in this folder?

/var/folders/2f/55z0d46n3l10bq650j6svgw89rmqw1/T/mkguvw1MOO/VR86iPUDSJ!

And how is it generated?


Update

You may be pleased to know that I get the same fault. The data is downloaded from the internet, and the source is corrupt. I don't know enough to tell you who to contact, but that is where the fault is.

Borodin
  • 126,100
  • 9
  • 70
  • 144
  • No idea, I'm afraid. It's some kind of temporary directory, and it changes each time I make an attempt to run the script. For example, I ran it again just now, and the file in question became /var/folders/2f/55z0d46n3l10bq650j6svgw89rmqw1/T/rl0qhu09c1/pXy8RKI7n3. The /var/folders/2f/55z0d46n3l10bq650j6svgw89rmqw1/T/ part is still there, but if I check that T directory, it doesn't have either of the subdirectories (neither mkguvw1MOO nor rl0qhu09c1). – Matt LaFave Apr 24 '13 at 18:18
  • Ah, that's good news - sounds like the problem is on the server side. I've contacted NCBI to see what's going on. Thanks! – Matt LaFave Apr 24 '13 at 19:04
  • @MattLaFave: Looking further it looks like the problem is that the data starts with `Entrezgene-Set ::= ` and includes three items. BioPerl is expecting only `Entrezgene ::=`, and will not cope with sets. I guess BioPerl won't handle this aspect of Entrez Gene data. – Borodin Apr 24 '13 at 19:11
  • @MattLaFave: If you look at the `Bio::ASN1::EntrezGene` module, the `next_seq` subroutine insists on `Entrezgene ::=` at the start of the data. BioPerl won't handle Entrez Gene sets. Over to you :) You may want to get in touch with the author, Dr. Mingyi Liu , but I can't promise you'll get a reply you'll like. I'd be happy to upgrade the module myself if someone would like to enagage me. – Borodin Apr 24 '13 at 19:29