1

Someone knows how I can get the scientific name (or all the features) from a data in the GenBank using only the GenBank code accession and biopython. For example:

>>> From Bio import Entrez
>>> Entrez.email = someuser@mail.com
>>> Input = Entrez.someFunction(db="nucleotide", term="AY851612")
>>> output = Entrez.read(Input)
>>> print output

"Austrocylindropuntia subulata"

Or well:

>>> print output

"LOCUS AY851612 892 bp DNA linear PLN 10-APR-2007
DEFINITION Opuntia subulata rpl16 gene, intron; chloroplast.
ACCESSION AY851612
VERSION AY851612.1 GI:57240072
KEYWORDS .
SOURCE chloroplast Austrocylindropuntia subulata
ORGANISM Austrocylindropuntia subulata
Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons;
Caryophyllales; Cactaceae; Opuntioideae; Austrocylindropuntia.
REFERENCE 1 (bases 1 to 892)
AUTHORS Butterworth,C.A. and Wallace,R.S.
..."

Thanks to all ! =)

Ivan Castro
  • 581
  • 2
  • 10
  • 22
  • Have you read through the [appropriate section](http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc108) of the Biopython tutorial on accessing Entrez resources? – MattDMo Feb 05 '15 at 21:59
  • Yup, I read the chapter 9 which concern to "Accessing NCBI's Entrez databases", but it focus in the GI code instead the GB code (or accession code). =( – Ivan Castro Feb 05 '15 at 22:07

1 Answers1

5

Note that output is a dictionary. You can access any appropriate fields if needed. Also, you would want to use efetch, as opposed to esearch.

In [1]: from Bio import Entrez

In [3]: Entrez.email = '##############'

In [28]: handle = Entrez.efetch(db="nucleotide", id="AY851612", rettype="gb", retmode="text")

In [29]: x = SeqIO.read(handle, 'genbank')

In [30]: print(x)
ID: AY851612.1
Name: AY851612
Description: Opuntia subulata rpl16 gene, intron; chloroplast.
Number of features: 3
/date=10-APR-2007
/sequence_version=1
/taxonomy=['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta', 'Tracheophyta', 'Spermatophyta', 'Magnoliophyta', 'eudicotyledons', 'Gunneridae', 'Pentapetalae', 'Caryophyllales', 'Cactineae', 'Cactaceae', 'Opuntioideae', 'Austrocylindropuntia']
/data_file_division=PLN
/references=[Reference(title='Molecular Phylogenetics of the Leafy Cactus Genus Pereskia (Cactaceae)', ...), Reference(title='Direct Submission', ...)]
/keywords=['']
/accessions=['AY851612']
/gi=57240072
/organism=Austrocylindropuntia subulata
/source=chloroplast Austrocylindropuntia subulata
Seq('CATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAAAAAAATGA...AGA', IUPACAmbiguousDNA())

In [31]: x.description
Out[31]: 'Opuntia subulata rpl16 gene, intron; chloroplast.'
ericmjl
  • 13,541
  • 12
  • 51
  • 80