tl;dr: What is different about an esummary list produced by rentrez
, and why do said lists stop working with other rentrez
functions after they are merged using append()
?
I am accessing Pubmed using rentrez
. I am able to search for publications and download esummaries without problem. However, there must be something special about an esummary list that I do not understand, because things fall apart when I used append()
to try to merge lists. I have not been able to figure out what that difference is by reading the documentation. Here is the code that allows me to search Pubmed and download records without problem:
# set search term and retmax
term_set <- '"Transcription, Genetic"[Mesh] AND "Regulatory Sequences, Nucleic Acid"[Mesh] AND 2017:2018[PDAT]'
retmax_set <- 500
# search pubmed using web history
search.l <- entrez_search(db = "pubmed", term = term_set, use_history = T)
# get summaries of search hits using web history
for (seq_start in seq(0, search.l$count, retmax_set)) {
if (seq_start == 0) {summary.l <- list()}
summary.l[[length(summary.l)+1]] <- entrez_summary(
db = "pubmed",
web_history = search.l$web_history,
retmax = retmax_set,
retstart = seq_start
)
}
However, using summary.l <- list()
and then summary.l[[length(summary.l)+1]] <- entrez_summary(...
results in a list of lists of esummaries (3 sub-lists, in this search). This results in multiple for
loops in subsequent steps of the data extraction (below) and is an unweildly data structure.
# extract desired information from esummary, convert to dataframe
for (i in 1:length(summary.l)) {
if (i == 1) {faut.laut.l <- list()}
faut.laut <- summary.l[[i]] %>%
extract_from_esummary(
c("uid", "sortfirstauthor", "lastauthor"),
simplify = F
)
faut.laut.l <- c(faut.laut.l, faut.laut)
}
faut.laut.df <- rbindlist(faut.laut.l)
Using append()
in the code below gives a single list of all 1334 esummaries, avoiding the sub-lists.
# get summaries of search hits using web history
for (seq_start in seq(0, search.l$count, retmax_set)) {
if (seq_start == 0) {
summary.append.l <- entrez_summary(
db = "pubmed",
web_history = search.l$web_history,
retmax = retmax_set,
retstart = seq_start
)
}
summary.append.l <- append(
summary.append.l,
entrez_summary(
db = "pubmed",
web_history = search.l$web_history,
retmax = retmax_set,
retstart = seq_start
)
)
}
However, in the subsequent step extract_from_esummary()
throws an error, even though the documentation says states that the argument esummaries
should be a list of esummary objects.
# extract desired information from esummary, convert to dataframe
faut.laut.append.l <- extract_from_esummary(
esummaries = summary.append.l,
elements = c("uid", "sortfirstauthor", "lastauthor"),
simplify = F
)
Error in UseMethod("extract_from_esummary", esummaries) :
no applicable method for 'extract_from_esummary' applied to an object of class "list"
faut.laut.append.df <- rbindlist(faut.laut.append.l)
Error in rbindlist(faut.laut.append.l) :
object 'faut.laut.append.l' not found
A search that yeilds less than 500 records can be done in a single call of entrez_summary()
and does not require the concatenation of lists. As a result, the code below works.
# set search term and retmax
term_set_small <- 'kadonaga[AUTH]'
retmax_set <- 500
# search pubmed using web history
search_small <- entrez_search(db = "pubmed", term = term_set_small, use_history = T)
# get summaries from search with <500 hits
summary_small <- entrez_summary(
db = "pubmed",
web_history = search_small$web_history,
retmax = retmax_set
)
# extract desired information from esummary, convert to dataframe
faut.laut_small <- extract_from_esummary(
esummaries = summary_small,
elements = c("uid", "sortfirstauthor", "lastauthor"),
simplify = F
)
faut.laut_small.df <- rbindlist(faut.laut_small)
Why does append()
break the esummaries, and can this be avoided? Thanks.