I am looking for a way to efficiently ask Entrez (Biopython) to retrieve the number of articles in PubMed associated to a given indication/condition. I only have the list of full indications.
Now, I have worked out a way, the only problem being that it is quite imprecise. Indeed, it does not take into account possible biases coming from the "way the disease is described/written". Ideally, I would like to retrieve the mesh term associated to a condition and find out the number of articles associated to that mesh term.
Thank you a lot,
Federico
EDIT 1:
Please add your code, otherwise you probably wont get an answer.
Yes, sorry:
query = "aneurysm"
handle1 = Entrez .esearch(db="mesh", term=query)
record1 = Entrez.read(handle1)
handle.close()
Basically the above code starts from a disease and tries to access the mesh codes of thew disease. The problem is that this approach is very unstable and prone to mistakes (since for instance writing "diabetes" or "diabetes type II" or "diabetes type 2" produce slightly different results).
For the latter reasons, having new chemical trials identifiers (NCTID), a more structured approach:
import pandas as pd
Entrez.email = "mymail@gmail.ccom"
#search_results = Entrez.read(Entrez.esearch(db="pubmed", term = "NCT00000419[SI]"))
#count = int(search_results)
#records = count
handle1 = Entrez.esearch(db="pubmed", retmax=10, term="NCT00646048[si]",idtype="acc")
record1 = Entrez.read(handle1)
handle.close()
int(record1["Count"]) >= 2
I typed "NCT00000419[SI]" based on the article: [Linking ClinicalTrials.gov and PubMed to Track Results of Interventional Human Clinical Trials][1] at the section PubMEd.
The two above are of course easy attempts and my final goal is still retrieve the number of articles associated to an indication. Passing from NCTID is a way to do that, since apparently NCT has also mesh terms in it.
Thanks again!
EDIT 2: I have tried something like the following, but again, Indication levels are too broad. I would like to find a "more objective way" to count the number of articles. The best option to me is to use NCTID:
df=pd.read_stata("/Users/federiconutarelli/Desktop/First_work/PubMed/indictations_nomatch.dta")
indicationlevel3 = indicationlevel3.tolist()
years = [2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013]
records = {}
for indication in indicationlevel3:
for year in years:
records[(indication, year)] = 0
search_results = {}
count={}
for indication in indicationlevel3:
for year in years:
Entrez.email = "mymail@gmail.com"
search_results[(indication, year)] = Entrez.read(Entrez.esearch(db="pubmed",
term=indication,
mindate=year, maxdate=year, datetype="pdat",
usehistory="y"))
count[(indication, year)] = int(search_results[(indication, year)]["Count"])
#records[(indication, year)].append(count[(indication, year)])
records[(indication, year)] = count[(indication, year)]
for using NCT I have tried:
Entrez.email = "mymail@gmail.com"
id= "NCT00646048[si]"
handle = Entrez.efetch(db="pubmed", id=id, rettype="gb", retmode="xml")
record = Entrez.read(handle)
abstract=record['PubmedArticle'][0]['MedlineCitation']['Article']
abstract ```
But it does not seem to work.
[1]: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3706420/