1

I'm starting with page:

https://mysite/a"

I'd like to spider the page getting the full urls of any nested urls below this that begin with the same stem (like https://mysite/a/b ).

I've tried:

$ wget -r --spider --accept-regex "https://...*" 'https://.../' 2>test.txt

which produces a large amount of output inclusing what appear to be the urls I'm after like:

--2018-04-21 15:04:48--  https:/mysite/a/
Reusing existing connection to mysite:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'a/index.html.tmp.tmp'

How do I just print out a list of the urls?

Edit:

changed it to

$ wget -r --spider  'https://mysite/a/' |grep 'https://mysite/a*' 2>test.txt

as a test . No output is being saved in test.txt. The file is empty.

user1592380
  • 34,265
  • 92
  • 284
  • 515

0 Answers0