I'm using the wget
program, but I want it not to save the html file I'm downloading. I want it to be discarded after it is received. How do I do that?

- 5,231
- 7
- 34
- 46
-
I'm new to Linux - Would the `/dev/null` thing work? – Ram Rachum Oct 10 '09 at 02:23
-
3So what's the point to download it then? – Anonymous Dec 07 '09 at 14:59
-
1@Anonymous I assume to stress the remote server.. If you don't care about the content.. I'd probably use apachebench (ab) though. – Tom O'Connor Dec 06 '10 at 12:52
9 Answers
You can redirect the output of wget to /dev/null (or NUL on Windows):
wget http://www.example.com -O /dev/null
The file won't be written to disk, but it will be downloaded.

- 141,881
- 20
- 196
- 331
-
This doesn't save the page, but it send email to me. Also is it possible to disable emailing ? – trante Sep 07 '13 at 06:03
If you don't want to save the file, and you have accepted the solution of downloading the page in /dev/null
, I suppose you are using wget not to get and parse the page contents.
If your real need is to trigger some remote action, check that the page exists and so on I think it would be better to avoid downloading the html body page at all.
Play with wget
options in order to retrieve only what you really need, i.e. http headers, request status, etc.
assuming you need to check the page is ok (ie, the status returned is 200) you can do the following:
wget --no-cache --spider http://your.server.tld/your/page.html
if you want to parse server returned headers do the following:
wget --no-cache -S http://your.server.tld/your/page.html
See the wget man page for further options to play with.
See lynx
too, as an alternative to wget.

- 10,949
- 7
- 39
- 52
-
I'm confused. `--no-cache` in the man page says it causes wget to "send the remote server an appropriate directive (‘Pragma: no-cache’) to get the file from the remote service" – Gaia Jan 20 '13 at 19:05
-
It says to the server your client don't want a cached version of the file .. we want to get the very last release of the resource we are requesting for – drAlberT Jan 21 '13 at 15:55
In case you also want to print in the console the result you can do:
wget -qO- http://www.example.com

- 895
- 1
- 11
- 13
-
2I like this option best. It let's me see what it gets but doesn't save it. The switches are specifically `q` quiet mode, (it doesn't output progress and other info), and `O-` (write the retrieved document to console). – Octopus Sep 30 '16 at 21:16
$ wget http://www.somewebsite.com -O foo.html --delete-after

- 396
- 2
- 6
-
1Thanks a lot. The `--delete-after` option is the choice when you have to download recursively but you want to discard the actual content. – egelev Apr 23 '15 at 10:57
-
+1 for me, the command is intuitive -- at a glance, I can more quickly comprehend what's going to happen than `-O /dev/null` – fusion27 Oct 17 '19 at 11:28
Another alternative is to use a tool like curl
, which by default outputs the remote content to stdout
instead of saving it to a file.

- 3,367
- 29
- 27
Check out the "-spider" option. I use it to make sure my web sites are up and send me an email if they're not. This is a typical entry from my crontab:
46 */2 * * * if ! wget -q --spider http://www.rochesterflyingclub.com/ >/dev/null 2>&1; then echo "Rochester Flying Club site is down" ; fi

- 5,225
- 1
- 28
- 39
If you need to crawl a website using wget and want to minimize disk churn...
For a *NIX box and using wget
, I suggest skipping writing to a file . I noticed on my Ubuntu 10.04 box that wget -O /dev/null
caused wget to abort downloads after the first download.
I also noticed that wget -O real-file
causes wget to forget the actual links on the page. It insists on an index.html
to be present on each page. Such pages may not always be present and wget will not remember links it has seen previously.
For crawling without writing to disk, the best I came up with is the following
mkdir /dev/shm/1
cd /dev/shm/1
wget --recursive --relative --no-parent ...
Notice there is no -O file
option. wget will write to the $PWD directory. In this case that is a RAM-only tmpfs file system. Writing here should bypass disk churn (depending upon swap space) AND keep track of all links. This should crawl the entire website successfully.
Afterward, of course,
rm --recursive --force /dev/shm/1/*

- 725
- 2
- 5
- 24
Use the --delete-after option, which deletes the file after it is downloaded.
Edit: Oops, I just noticed that has already been answered.

- 27,458
- 12
- 55
- 109
According to the help doc(wget -h), you can use --spider option to skip download(version 1.14).
Download:
-S, --server-response print server response.
--spider don't download anything.

- 111
- 1
-
How does this add to the other answer that mentions --spider ? – Ward - Trying Codidact May 09 '19 at 04:37
-