I developed a software with PyQt and sqlite to manage scientific articles. Each article is stored in the sqlite database, and comes from a particular journal.
Sometimes, I need to perform some verifications on the articles of a journal. So I build two lists, one containing the DOI of the articles (a DOI is just a unique id for an article), and one containing booleans, True if the articles are ok, False if the articles are not:
def listDoi(self, journal_abb):
"""Function to get the doi from the database.
Also returns a list of booleans to check if the data are complete"""
list_doi = []
list_ok = []
query = QtSql.QSqlQuery(self.bdd)
query.prepare("SELECT * FROM papers WHERE journal=?")
query.addBindValue(journal_abb)
query.exec_()
while query.next():
record = query.record()
list_doi.append(record.value('doi'))
if record.value('graphical_abstract') != "Empty":
list_ok.append(True)
else:
list_ok.append(False)
return list_doi, list_ok
This function returns the two lists. The lists can contain ~2000 items each. After that, to check if an article is ok, I just check if it is in the two lists.
EDIT: I also need to check if an article is only in list_doi.
So I wonder, because performance matters here: what is faster/better/more economic:
- build the two lists, and check if the article is present in the two lists
- write the function in another way: checkArticle(doi_article), and the function would perform a SQL query for each article
What about the speed and the space in RAM ? Will the results be different if there are few items, or a lot of them ?