I have written the following function to extract data from PubMed using Entrez:
def getFromPubMed(id):
handle = Entrez.efetch(db="pubmed",rettype="medline",retmode="text", id=str(id))
records = Medline.parse(handle)
for record in records:
abstract = str(record["AB"])
mesh = str(record["MH"]).replace("'", "").replace("[", "").replace("]", "")
pmid = str(record["PMID"])
title = str(record["TI"]).replace("'", "").replace("[", "").replace("]", "")
pt = str(record["PT"]).replace("'", "").replace("[", "").replace("]", "")
au = str(record["AU"])
dp = str(record["DP"])
la = str(record["LA"])
pmc = str(record["PMC"])
si = str(record["SI"])
try:
doi=str(record["AID"])
except:
doi = str(record["SO"]).split('doi:',1)[1]
return pmid, title, abstract, au, mesh, doi, pt, la, pmc
However, this function will not always work as not all MEDLINE records contain all fields. For example, this PMID doesn't contain any MeSH headings.
I could wrap each item with a try-except statement, for example for abstract
:
try:
abstract = str(record["AB"])
except:
abstract = ""
but it seems a clunky way to implement this. Whats a more elegant solution?