I'm trying to download DNA sequence data from NCBI using entrez_fetch
. With the following code, I perform a search for the IDs of the sequences I need with entrez_search
, and then I attempt to download the sequence data in FASTA format:
library(rentrez)
#Search for sequence ids
search <- entrez_search(db = "biosample",
term = "Escherichia coli[Organism] AND geo_loc_name=USA:WA[attr]",
retmax = 9999, use_history = T)
search$ids
length(search$ids)
search$web_history
#Download sequence data
ecoli_fasta <- entrez_fetch(db = "nuccore",
web_history = search$web_history,
rettype = "fasta")
When I do this, I get the following error:
Error: HTTP failure: 400
Cannot+retrieve+query+from+history
I don't understand what this means and Googling hasn't led me to an answer.
I tried using a different package (ape
) and the function read.GenBank
to download the sequences as an alternative, but this method only managed to download about 1000 of the 12000 sequences I needed. I would like the use entrez_fetch
if possible - does anyone have any insight for me?