0

i'm crawling a forum and I keep stumbling across certain threads that have been going on for ten years.

i can certainly exclude these using wget option:

-X /t/41866,/t/314849,/t/335041,/t/356321,/t/491462,/t/493609,/t/493655,/t/493667,/t/493668,/t/493676,/t/493678

and I can also exclude them by inserting the string in the wgetrc file

but what i'd like to do is just call a file that contains the string, like you can do with the -i option where you call a file that pulls in the URLs of interest

so instead of (from the GNU wget 1.11.4 manual)

exclude directories = string Specify a comma-separated list of directories you wish to exclude from download— the same as ‘-X string’

so i'd like the string to actually pull in the contents of a file. is there a way to do this?

mcwizard
  • 103
  • 2
  • Welcome to Server Fault! Your question is off topic for Serverfault because it doesn't appear to relate to servers/networking or desktop infrastructure in a professional environment. It may be on topic for [Superuser](http://superuser.com) but please [search](http://superuser.com/search) their site for similar questions that may already have the answer you're looking for. – Dennis Kaarsemaker Feb 17 '13 at 20:28

3 Answers3

0

You can use the -I list or --include-directories=list option:

   -I list
   --include-directories=list
       Specify a comma-separated list of directories you wish to follow when downloading.  Elements
       of list may contain wildcards.
Daniel t.
  • 9,291
  • 1
  • 33
  • 36
0
wget -X `perl -MFile::Slurp -e '@lines=read_file("./FILE.txt"); chop @lines; print join ",", @lines'`

(you may need to install File::Slurp Perl module).

porton
  • 312
  • 1
  • 14
0

You could always just use the shell

wget -X `head -n1 exclude_file` blah

the head -n1 is incase there is a trailing newline in the file.

R. S.
  • 1,714
  • 12
  • 19