I wanted to download FASTQ files associated with a particular BioProject (PRJEB21446) from the European Nucleotide Archive. There is a button to generate and download a shell script containing wget commands for all FASTQ files associated with the BioProject. Great! That gives me a script with the following commands:
wget -nc [ftp-link-to-sample1.fastq.gz]
wget -nc [ftp-link-to-sample2.fastq.gz]
...
wget -nc [ftp-link-to-sample40.fastq.gz]
EDIT: Here are the first 5 lines of the script from ENA:
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/004/ERR2014384/ERR2014384_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/006/ERR2014386/ERR2014386_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/001/ERR2014361/ERR2014361_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/009/ERR2014369/ERR2014369_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR201/007/ERR2014367/ERR2014367_1.fastq.gz
However, when I tried to run the script using sh script_from_ENA.sh
, the first file downloads without any problems, but all files after that are stuck at 0% for about 20 seconds, then show the following:
2023-08-14 10:54:01 (0.00 B/s) - Data transfer aborted.
Retrying.
wget
then attempts to download the same file over and over again with no success.
After spending all morning trying various workarounds, I eventually solved the problem by putting all the URLs into a single file and running wget
in a for loop, like so:
sed 's/wget -nc //' script_from_ENA.sh > url-list
for i in `cat url-list` ; do wget -nc $i ; done
This worked like a charm and the files downloaded without any problem, but I'm still curious as to why the script generated by ENA didn't work. Was it an issue with wget
or the ENA servers cutting me off?
If anyone can offer insight or an explanation, I'd be very grateful- thanks!