0

I have written the following function to extract data from PubMed using Entrez:

def getFromPubMed(id):
    handle = Entrez.efetch(db="pubmed",rettype="medline",retmode="text", id=str(id))
    records = Medline.parse(handle)
    for record in records:
        abstract = str(record["AB"])
        mesh = str(record["MH"]).replace("'", "").replace("[", "").replace("]", "")
        pmid = str(record["PMID"])
        title = str(record["TI"]).replace("'", "").replace("[", "").replace("]", "")
        pt = str(record["PT"]).replace("'", "").replace("[", "").replace("]", "")
        au = str(record["AU"])
        dp = str(record["DP"])
        la = str(record["LA"])
        pmc = str(record["PMC"])
        si = str(record["SI"])
        try:
            doi=str(record["AID"])
        except:
            doi = str(record["SO"]).split('doi:',1)[1]
        return pmid, title, abstract, au, mesh, doi, pt, la, pmc

However, this function will not always work as not all MEDLINE records contain all fields. For example, this PMID doesn't contain any MeSH headings.

I could wrap each item with a try-except statement, for example for abstract:

try:
  abstract = str(record["AB"])
except:
  abstract = ""

but it seems a clunky way to implement this. Whats a more elegant solution?

jdoe
  • 634
  • 5
  • 19

2 Answers2

2

You could split the action of extracting the fields off to a seperate method - doing something like the below:

def get_record_attributes(record, attr_details):
    attributes = {}

    for attr_name, details in attr_details.items():
        value = ""
        try:
            value = record[details["key"]]

            for char in details["chars_to_remove"]:
                value = value.replace(char, "")
        except KeyError, AttributeError:
            pass

        attributes[attr_name] = value

    return attributes

def getFromPubMed(id):
    handle = Entrez.efetch(db="pubmed",rettype="medline",retmode="text", id=str(id))
    records = Medline.parse(handle)
    for record in records:
        attr_details = {
            "abstract" : {"key" : "AB"},
            "mesh" : { "key" : "MH", "chars_to_remove" : "'[]"},
            #...
            "aid" : {"key" : "AB"},
            "so" : {"key" : "SO"},
        }

        attributes = get_record_attributes(record, attr_details)

       #...
stuartgm
  • 89
  • 3
  • 1
    This works really well. Thanks. Note it should be `attributes[attr_name] = value` not `attribute[attr_name] = value`. – jdoe Feb 04 '18 at 01:55
-1

What about:

mesh = str(record["MH"] or '')

Since an empty dictionary is FALSE as this post suggests

panchtox
  • 634
  • 7
  • 16