I have a program which retrieves a list of PubMed publications and wish to build a graph of co-authorship, meaning that for each article I want to add each author (if not already present) as a vertex and add an undirected edge (or increase its weight) between every coauthor.
I managed to write the first of the program which retrieves the list of authors for each publication and understand I could use the NetworkX library to build the graph (and then export it to GraphML for Gephi) but cannot wrap my head on how to transform the "list of lists" to a graph.
Here follows my code. Thank you very much.
### if needed install the required modules
### python3 -m pip install biopython
### python3 -m pip install numpy
from Bio import Entrez
from Bio import Medline
Entrez.email = "rja@it.com"
handle = Entrez.esearch(db="pubmed", term='("lung diseases, interstitial"[MeSH Terms] NOT "pneumoconiosis"[MeSH Terms]) AND "artificial intelligence"[MeSH Terms] AND "humans"[MeSH Terms]', retmax="1000", sort="relevance", retmode="xml")
records = Entrez.read(handle)
ids = records['IdList']
h = Entrez.efetch(db='pubmed', id=ids, rettype='medline', retmode='text')
#now h holds all of the articles and their sections
records = Medline.parse(h)
# initialize an empty vector for the authors
authors = []
# iterate through all articles
for record in records:
#for each article (record) get the authors list
au = record.get('AU', '?')
# now from the author list iterate through each author
for a in au:
if a not in authors:
authors.append(a)
# following is just to show the alphabetic list of all non repeating
# authors sorted alphabetically (there should become my graph nodes)
authors.sort()
print('Authors: {0}'.format(', '.join(authors)))