I does not seem possible to achieve my goal with current versions of wget
.
After studying the source code for wget
version 1.18, I came to these conclusions:
wget
cannot recurse if it does not store the downloaded files, at least temporarily as for --spider
.
When passed -O filename
, it keeps appending to filename
and reparses the whole file after each download, loading it completely in memory (or mapping it). This is very cumbersome and inefficient.
When passed -O-
, it pipes the downloaded file to stdout
and attempts to reload -
to look for more urls to fetch... Which causes stdin
to be read for this purpose. This is a side effect of the implementation.
I wrote a patch to add a more sensible piping option, relying on --spider
to download html and css files for recursive operation and piping only these files before they are removed. I will publish the patch when it is reasonably tested and documented.