86

A page contains links to a set of .zip files, all of which I want to download. I know this can be done by wget and curl. How is it done?

uyetch
  • 2,150
  • 3
  • 28
  • 33

3 Answers3

137

The command is:

wget -r -np -l 1 -A zip http://example.com/download/

Options meaning:

-r,  --recursive          specify recursive download.
-np, --no-parent          don't ascend to the parent directory.
-l,  --level=NUMBER       maximum recursion depth (inf or 0 for infinite).
-A,  --accept=LIST        comma-separated list of accepted extensions.
jdphenix
  • 15,022
  • 3
  • 41
  • 74
creaktive
  • 5,193
  • 2
  • 18
  • 32
  • 16
    The `-nd` (no directories) flag is handy if you don't want any extra directories created (i.e., all files will be in the root folder). – Steve Davis Nov 06 '13 at 23:19
  • 1
    How do I tweak this solution for it to go deeper from the given page? I tried -l 20, but wget stops immediatly. – Wrench Nov 27 '15 at 14:46
  • 2
    If the files aren't in the same directory as the starting URL, you might need to get rid of `-np`. If they're on a different host, you'll need `--span-host`. – Dan Sep 26 '18 at 15:28
  • Is there a way to keep the directory structure of the website, but exclude the root folder only, such that the current directly is the root folder of the website instead of a folder with the name of the website's URL? – Aaron Franke Jun 02 '21 at 13:39
92

Above solution does not work for me. For me only this one works:

wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off [url of website]

Options meaning:

-r            recursive
-l1           maximum recursion depth (1=use only this directory)
-H            span hosts (visit other hosts in the recursion)
-t1           Number of retries
-nd           Don't make new directories, put downloaded files in this one
-N            turn on timestamping
-A.mp3        download only mp3s
-erobots=off  execute "robots.off" as if it were a part of .wgetrc
Richard
  • 56,349
  • 34
  • 180
  • 251
K.-Michael Aye
  • 5,465
  • 6
  • 44
  • 56
7

For other scenarios with some parallel magic I use:

curl [url] | grep -i [filending] | sed -n 's/.*href="\([^"]*\).*/\1/p' |  parallel -N5 wget -
M Lindblad
  • 71
  • 1
  • 3