How to download all links to .zip files on a given web page using wget/curl?

Question

A page contains links to a set of .zip files, all of which I want to download. I know this can be done by wget and curl. How is it done?

score 137 · Accepted Answer · edited Mar 20 '14 at 05:24

137

The command is:

wget -r -np -l 1 -A zip http://example.com/download/

Options meaning:

-r,  --recursive          specify recursive download.
-np, --no-parent          don't ascend to the parent directory.
-l,  --level=NUMBER       maximum recursion depth (inf or 0 for infinite).
-A,  --accept=LIST        comma-separated list of accepted extensions.

edited Mar 20 '14 at 05:24

jdphenix

15,022
3
41
74

answered Nov 26 '12 at 22:12

creaktive

5,193
2
18
32

16

The `-nd` (no directories) flag is handy if you don't want any extra directories created (i.e., all files will be in the root folder). – Steve Davis Nov 06 '13 at 23:19
1

How do I tweak this solution for it to go deeper from the given page? I tried -l 20, but wget stops immediatly. – Wrench Nov 27 '15 at 14:46
2

If the files aren't in the same directory as the starting URL, you might need to get rid of `-np`. If they're on a different host, you'll need `--span-host`. – Dan Sep 26 '18 at 15:28
Is there a way to keep the directory structure of the website, but exclude the root folder only, such that the current directly is the root folder of the website instead of a folder with the name of the website's URL? – Aaron Franke Jun 02 '21 at 13:39

score 92 · Answer 2 · edited Mar 09 '14 at 04:43

92

Above solution does not work for me. For me only this one works:

wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off [url of website]

Options meaning:

-r            recursive
-l1           maximum recursion depth (1=use only this directory)
-H            span hosts (visit other hosts in the recursion)
-t1           Number of retries
-nd           Don't make new directories, put downloaded files in this one
-N            turn on timestamping
-A.mp3        download only mp3s
-erobots=off  execute "robots.off" as if it were a part of .wgetrc

edited Mar 09 '14 at 04:43

Richard

56,349
34
180
251

answered Sep 10 '13 at 02:09

K.-Michael Aye

5,465
6
44
56

2

Source: http://www.commandlinefu.com/commands/view/12498/download-all-music-files-off-of-a-website-using-wget – James Jeffery Sep 18 '14 at 16:10
yes, thanks! I didn't remember where it came from, have it just lying in my scripts. – K.-Michael Aye Sep 18 '14 at 19:33
don't know sorry. make a new question! ;) – K.-Michael Aye Feb 12 '15 at 21:36
1

+1 for the `-H` switch. This is what was preventing the first answer (which is what I tried before looking on SO) from working. – Alex Jun 26 '17 at 16:10
I'm getting a "Mandatory arguments to long options are mandatory for short options too" error with this one. :( – François Leblanc Mar 26 '18 at 15:00
Command line Jedi option skills. Works a treat, ensure you have well-formed html document and not just a blob of html body. – chopstik Apr 26 '18 at 14:39
actually, i just realize that my source was NOT command-line fu because I answered this on 2013-07-10, while the commandlinefu entry is from 2013-07-13. But I also note that I did not come up with this, I found it somewhere... – K.-Michael Aye Nov 01 '18 at 00:14
1

Nope, you answered this on 2013-**09**-10. – Quasímodo Mar 15 '21 at 20:11
1

DOH, I must have thought, September=7! Who made this month the 9th month?? :) (Of course, the Romans...) – K.-Michael Aye Apr 15 '21 at 02:43

score 7 · Answer 3 · answered Jan 11 '18 at 20:34

7

For other scenarios with some parallel magic I use:

curl [url] | grep -i [filending] | sed -n 's/.*href="\([^"]*\).*/\1/p' |  parallel -N5 wget -

answered Jan 11 '18 at 20:34

M Lindblad

71
1
3

How to download all links to .zip files on a given web page using wget/curl?

3 Answers3

Linked