0

I'm trying to find a list of SNPs that have PubMed entries using the rentrez package. When I run the code below, I end up with a NULL data frame. I think I'm not writing the data frame correctly.

library(rentrez)

term <- c('AKR1C1[GENE] AND snp_pubmed[Filter] AND Homo sapiens[Organism]',
'AKR1C2[GENE] AND snp_pubmed[Filter] AND Homo sapiens[Organism]')

p.snps <- for (i in seq_along(term)) {
  entrez_search(db="SNP",
                term = i,
                usehistory = "y"
                )
}

I would like to do this for approximately 100 Genes.

zx8754
  • 52,746
  • 12
  • 114
  • 209
JoeShmo
  • 71
  • 6
  • 1
    `for` loops don't store the results. You might instead consider `p.snps <- lapply(seq_along(term), function(i) entrez_search(...))`. – r2evans Apr 12 '17 at 23:49

1 Answers1

3

Problems

There are several problems:

  • for loops do not return a value
  • the second argument to entrez_search should be a character string representing the term but the code in the question passes it a number
  • the question refers to a data frame but the natural way to return this is a list of "esearch" objects (although this could be later transformed further).

Corrected code

Try this:

p.snps <- vector(length = length(term), mode = "list")

for (i in seq_along(term)) {
  p.snps[[i]] <- entrez_search(db = "SNP", term = term[i], history = "y")
}
names(p.snps) <- term

Shorter alternative or all in one line:

p.snps <- sapply(term, entrez_search, db = "SNP", usehistory = "y", simplify = FALSE)

Long form data frame

To convert this list into a long form data frame of ids with a second column giving the query:

ids <- lapply(p.snps, "[[", "ids")
stack(ids)

giving:

     values                                                            ind
1  41314625 AKR1C1[GENE] AND snp_pubmed[Filter] AND Homo sapiens[Organism]
2  17344137 AKR1C1[GENE] AND snp_pubmed[Filter] AND Homo sapiens[Organism]
3  11548049 AKR1C1[GENE] AND snp_pubmed[Filter] AND Homo sapiens[Organism]
4   7097713 AKR1C1[GENE] AND snp_pubmed[Filter] AND Homo sapiens[Organism]
...etc...

If you would rather have index values (1, 2, ...) rather than query strings run this statement before the stack statement:

names(ids) <- seq_along(ids)

In that case the output of the stack statement would be:

     values ind
1  41314625   1
2  17344137   1
3  11548049   1
4   7097713   1
5   3930965   1
6   3763675   1
...etc...
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • This is exactly what I was looking for. I was thinking a `sapply` or `lapply` approach might be best. Thanks! – JoeShmo Apr 13 '17 at 16:47