It's easy but annoying (XML craziness involved). First you retrieve your record from Entrez:
handle = Entrez.efetch(db="gene",
id="10555",
retmode="xml")
Now handle
is a generator for XML lines. You can parse them with Entrez.parse()
from Biopython, but I find the XML too entangled to deal with it. Your mRNA ids are in
<Entrezgene_comments>
<Gene-commentary>
<Gene-commentary_comment>
<Gene-commentary>
<Gene-commentary_products>
<Gene-commentary>
<Gene-commentary_type value="mRNA">
<Gene-commentary_products>
<Gene-commentary>
<Gene-commentary_type value="peptide">
<Gene-commentary_accession>NP_001012745</Gene-commentary_accession>
After parsing with Entrez.parse()
you'll have a mix of dicts with lists to dive in until you reach your accession id. Once you have this id
, you can ask for the sequence to entrez with:
handle = Entrez.efetch(db="protein",
id="NP_001012745",
rettype="fasta",
retmode="text")
An alternative approach involves parsing a gene_table. Fetch the same handle than before, but instead of a XML ask for a gene_table:
handle = Entrez.efetch(db="gene",
id="10555",
rettype="gene_table",
retmode="text")
In the gene_table you'll find some lines in the form:
mRNA transcript variant 2 NM_001012727.1
protein isoform b precursor NP_001012745.1
Exon table for mRNA NM_001012727.1 and protein NP_001012745.1
From where you can get your ids
.