Pipe output of cat to cURL to download a list of files

Question

I have a list URLs in a file called urls.txt. Each line contains 1 URL. I want to download all of the files at once using cURL. I can't seem to get the right one-liner down.

I tried:

$ cat urls.txt | xargs -0 curl -O

But that only gives me the last file in the list.

Thanks, @bkconrad. I had issues with newlines on Windows though, I fixed it with `tr`: `for i in $(cat urls.txt) ; do curl -O $(echo $i | tr '\r' ' ') ; done` — biphobe, Jun 02 '16 at 13:05

ghoti · Accepted Answer · 2023-05-20T15:56:44.387

145

This works for me:

$ xargs -n 1 curl -O < urls.txt

I'm in FreeBSD. Your xargs may work differently.

Note that this runs sequential curls, which you may view as unnecessarily heavy. If you'd like to save some of that overhead, the following may work in bash:

$ mapfile -t urls < urls.txt
$ curl ${urls[@]/#/-O }

This saves your URL list to an array, then expands the array with options to curl to cause targets to be downloaded. The curl command can take multiple URLs and fetch all of them, recycling the existing connection (HTTP/1.1), but it needs the -O option before each one in order to download and save each target. Note that characters within some URLs ] may need to be escaped to avoid interacting with your shell.

Or if you are using a POSIX shell rather than bash:

$ curl $(printf ' -O %s' $(cat urls.txt))

This relies on printf's behaviour of repeating the format pattern to exhaust the list of data arguments; not all stand-alone printfs will do this. If yours has problems, you might use another tool:

$ curl $(sed 's/^/-O /' < urls.txt)

Note that this non-xargs method also may bump up against system limits for very large lists of URLs. Research ARG_MAX and MAX_ARG_STRLEN if this is a concern.

edited May 20 '23 at 15:56

answered Mar 26 '12 at 02:29

ghoti

45,319
8
65
104

This seems to work, but it's only giving me a 125 byte HTML file containing the name of the file, **not** the actual file contents. – Finch Mar 26 '12 at 02:56
1

Ah, I see. There was a redirect involved so I needed to add the `-L` option to `curl`. – Finch Mar 26 '12 at 03:54
5

Thanks for the hint! Thats working on my Mac, but I prefer the pipeline version `cat urls.txt | xargs -n 1 curl -O` ;-) – orzechow Apr 04 '14 at 19:09
@Pio, fair enough, it all works, but for your reading pleasure, http://unix.stackexchange.com/questions/16279/should-i-care-about-unnecessary-cats – ghoti Apr 30 '14 at 18:29
This worked great!. However I used this in git bash on windows, and it didn't like `\r` characters in the text file. – James McDonnell Sep 18 '17 at 20:37
@JamesMcDonnell - one always needs to adapt to one's environment. (Or replace the environment; YMMV.) If you're in bash, you could `sed $'s/\r$//' urls.txt | xargs -n 1 curl -O` or if you just want to nuke all `\r`s, it may be simpler to: `tr -d '\r' < urls.txt | xargs -n 1 curl -O`. – ghoti Sep 18 '17 at 20:59
You won't get keep-alive doing it this way – William Entriken Oct 22 '17 at 00:07
@orzechow that's called [useless use of cat](https://en.wikipedia.org/wiki/Cat_(Unix)#UUOC_(Useless_Use_Of_Cat)) – Muayyad Alsadi Jan 11 '18 at 19:32
@FullDecent, that's true. I've updated my answer to include an option for connection re-use. – ghoti Mar 08 '18 at 04:55
The (new) `$(printf ... $( – dave_thompson_085 Mar 08 '18 at 06:38
@dave_thompson_085 - right you are, I added the quotes as an afterthought without testing (). Thanks for pointing this out. I've removed them, and also added references re limits. – ghoti Mar 08 '18 at 12:30
Works great on Mac OS Mojave. – norcal johnny Feb 16 '19 at 02:05
I tried the second example but it didn't work: `curl: option -O http://example.com: is unknown` – Lamp Apr 16 '20 at 21:17
@Lamp, there isn't enough detail in your comment to debug. Try throwing an `echo` at the beginning of the line to see what command line you're actually generating. And perhaps, mind your CRLFs. If nothing seems to work, please do [ask a question](https://stackoverflow.com/questions/ask)! If you refer to it here, I'd be happy to chime in. :-) – ghoti Apr 17 '20 at 02:30
@Lamp .. Ah. the quotes. Remove them and it should work. Amazing that after all these years, you're the first person to notice this. :-D – ghoti Apr 19 '20 at 15:21

score 37 · Answer 2 · answered Dec 02 '15 at 13:50

37

A very simple solution would be the following: If you have a file 'file.txt' like

url="http://www.google.de"
url="http://www.yahoo.de"
url="http://www.bing.de"

Then you can use curl and simply do

curl -K file.txt

And curl will call all Urls contained in your file.txt!

So if you have control over your input-file-format, maybe this is the simplest solution for you!

answered Dec 02 '15 at 13:50

Dirk

1,064
1
11
13

1

Will this use HTTP keep-alive? – William Entriken Oct 22 '17 at 00:07
1

@FullDecent It reuses the connection this way – Allan Deamon Jun 20 '18 at 17:38

score 17 · Answer 3 · edited Nov 05 '16 at 14:47

17

Or you could just do this:

cat urls.txt | xargs curl -O

You only need to use the -I parameter when you want to insert the cat output in the middle of a command.

edited Nov 05 '16 at 14:47

SamB

9,039
5
49
56

answered Aug 02 '13 at 07:39

user1101791

424
3
5

1

not sure why this is voted down but it works perfectly for me, but instead of a flat text file for input I had the output of grep. – rob Mar 12 '15 at 09:12
1

Probably downvoted because it's wrong. The `-o` option for curl specifies an output file as its argument. Other answers recommend `-O`, which tells curl to determine the local name based on the remote name of the file. – ghoti Nov 11 '15 at 15:43
It may not work in case of redirection, when URL returns new location with status 301, 302, etc. To fix that just use `-OL` – Oleksandr Shmyrko Sep 12 '22 at 10:00

score 8 · Answer 4 · edited Jun 20 '20 at 09:12

xargs -P 10 | curl

GNU xargs -P can run multiple curl processes in parallel. E.g. to run 10 processes:

xargs -P 10 -n 1 curl -O < urls.txt

This will speed up download 10x if your maximum download speed if not reached and if the server does not throttle IPs, which is the most common scenario.

Just don't set -P too high or your RAM may be overwhelmed.

GNU parallel can achieve similar results.

The downside of those methods is that they don't use a single connection for all files, which what curl does if you pass multiple URLs to it at once as in:

curl -O out1.txt http://exmple.com/1 -O out2.txt http://exmple.com/2

as mentioned at https://serverfault.com/questions/199434/how-do-i-make-curl-use-keepalive-from-the-command-line

Maybe combining both methods would give the best results? But I imagine that parallelization is more important than keeping the connection alive.

See also: Parallel download using Curl command line utility

score 7 · Answer 5 · edited Oct 30 '15 at 08:07

7

Here is how I do it on a Mac (OSX), but it should work equally well on other systems:

What you need is a text file that contains your links for curl

like so:

    http://www.site1.com/subdirectory/file1-[01-15].jpg
    http://www.site1.com/subdirectory/file2-[01-15].jpg
    .
    .
    http://www.site1.com/subdirectory/file3287-[01-15].jpg

In this hypothetical case, the text file has 3287 lines and each line is coding for 15 pictures.

Let's say we save these links in a text file called testcurl.txt on the top level (/) of our hard drive.

Now we have to go into the terminal and enter the following command in the bash shell:

    for i in "`cat /testcurl.txt`" ; do curl -O "$i" ; done

Make sure you are using back ticks (`) Also make sure the flag (-O) is a capital O and NOT a zero

with the -O flag, the original filename will be taken

Happy downloading!

edited Oct 30 '15 at 08:07

cde

317
1
18

answered Jan 03 '14 at 21:27

Stefan Gruenwald

2,582
24
30

You should quote your variable references. What if someone planted a file with a special character in your text file? Add a line, `echo ";sudo rm -rf ~/" >> testcurl.txt` and see what happens. – ghoti Mar 31 '14 at 20:22
4

^If you don't know, do not do this. – Rick Hanlon II Sep 11 '14 at 13:01
2

This is a horrible solution; it not only spawns a separate process for each download, but it also has to re-establish the TCP connection every single time, wasting a lot of time on even medium-latency networks. – cnst Aug 15 '15 at 07:01

score 4 · Answer 6 · answered Aug 15 '15 at 07:19

4

As others have rightly mentioned:

-cat urls.txt | xargs -0 curl -O
+cat urls.txt | xargs -n1 curl -O

However, this paradigm is a very bad idea, especially if all of your URLs come from the same server -- you're not only going to be spawning another curl instance, but will also be establishing a new TCP connection for each request, which is highly inefficient, and even more so with the now ubiquitous https.

Please use this instead:

-cat urls.txt | xargs -n1 curl -O
+cat urls.txt | wget -i/dev/fd/0

Or, even simpler:

-cat urls.txt | wget -i/dev/fd/0
+wget -i/dev/fd/0 < urls.txt

Simplest yet:

-wget -i/dev/fd/0 < urls.txt
+wget -iurls.txt

answered Aug 15 '15 at 07:19

cnst

25,870
6
90
122

2

The OP was specifically about how to do this with curl. Perhaps this is for use on a system where curl is already installed but wget is not, OSX for example. Also, there's no need to depend on devfs, you can also use `-i-` to refer to stdin. I.e.: `wget -i- < urls.txt` Lastly, if you want `curl` to request multiple URLs at once, without requiring a respawn, you can always just put them on the command line. `xargs curl < urls.txt` does this, using HTTP/1.1. You are limited in the number of URLs by the command line length that xargs can process. Find out this limit with `getconf ARG_MAX`. – ghoti Sep 18 '15 at 14:06

Pipe output of cat to cURL to download a list of files

6 Answers6

Linked